|
/opt/sohuhadoop/hadoop/bin/hadoop-daemon.sh start tasktracker
<name>dfs.balance.bandwidthPerSec</name>
<value>10485760</value>
<description>
Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
</description>
</property>
Balancing took 2.9950980555555557 hours
十二月 13th, 2010利用Decommission从Hadoop集群中Remove节点
10.15.10.42
10.15.10.43
<name>dfs.hosts.exclude</name>
<value>/opt/sohuhadoop/conf/excludes</value>
<final>true</final>
</property>
十一月 19th, 2010基于Hadoop的Hbase环境搭建
源码,并解压
cd /opt/hadoop/
tar zxvf hbase-0.20.6.tar.gz
ln -s hbase-0.20.6 hbase
export HBASE_LOG_DIR=/opt/log/hbase
export HBASE_MANAGES_ZK=true
<name>hbase.rootdir</name>
<value>hdfs://zw-hadoop-master:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://zw-hadoop-master:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zw-hadoop-slave225,zw-hadoop-slave226,zw-hadoop-slave227</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a
fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/log/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
hbase.rootdir设置hbase在hdfs上的目录,主机名为hdfs的namenode节点所在的主机hbase.cluster.distributed设置为true,表明是完全分布式的hbase集群hbase.master设置hbase的master主机名和端口hbase.zookeeper.quorum设置zookeeper的主机,官方推荐设置为3,5,7比较好
/opt/sohuhadoop/hbase/bin/stop-hbase.sh
http://10.10.71.1:60030/regionserver.jsp
十一月 18th, 2010Hadoop集群的NameNode的备份
<name>dfs.name.dir</name>
<value>/pvdata/hadoopdata/name/,/opt/hadoopdata/name/</value>
</property>
fs.checkpoint.size定义了edits日志文件的最大值,一旦超过这个值会导致强制执行检查点(即使没到检查点的最大时间间隔)。默认值是64MB。
<name>fs.checkpoint.dir</name>
<value>/opt/hadoopdata/secondname,/pvdata/hadoopdata/secondname</value>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary images to merge.
If this is a comma-delimited list of directories then the image is
replicated in all of the directories for redundancy.
</description>
</property>
十一月 17th, 2010惊天大悲剧-Hadoop的rmr和trash
首先一定要控制好hadoop上各用户的权限,使各user只能操作自己的目录尽量少用hadoop的超级用户进行操作,可以减少误操作hadoop的rm和rmr命令,设计的太BT了,连一个确认提示都没有,直接就删除了。看到有人给官方提了这个建议,但人家回复说:已经有了trash机制了,所以不需要提示,真是无语….hadoop的trash功能:很遗憾,之前没有配置trash,所以就直接给删除了,经过这次误操作,赶紧配置上trash,并设置保留时间为7天。
<name>fs.trash.interval</name>
<value>10080</value>
<description>
Number of minutes between trash checkpoints. If zero, the trash feature is disabled
</description>
</property>
hadoop fs -put *.txt /user/oplog/test
hadoop fs -rmr /user/oplog/test
hadoop fs -ls /user/oplog/.Trash/Current/user/oplog
drwxr-xr-x – oplog oplog 0 2010-11-16 10:44 /user/oplog/.Trash/Current/user/oplog/test
hadoop fs -mv /user/oplog/.Trash/Current/user/oplog/test /user/oplog/
hadoop fs -ls /user/oplog/.Trash/Current/user/oplog
drwxr-xr-x – oplog oplog 0 2010-11-16 10:44 /user/oplog/.Trash/Current/user/oplog/test
drwxr-xr-x – oplog oplog 0 2010-11-16 10:47 /user/oplog/.Trash/Current/user/oplog/test.1
九月 19th, 2010hadoop中mapred.tasktracker.map.tasks.maximum的设置
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
<description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
九月 14th, 2010从集群外操作Hadoop和Hive
|
| export JAVA_HOME=/opt/java/jdk export HADOOP_CONF_DIR=/opt/sohuhadoop/conf export HADOOP_HOME=/opt/sohuhadoop/hadoop export HIVE_HOME=/opt/sohuhadoop/hive vi /etc/hosts 10.10.1.1 hadoop-master. hadoop-master |
九月 13th, 2010Hadoop和Hive的权限问题
|
| / hadoop fs -chmod -R //
hadoop fs -chown -R : // |
为不同日志的MetaStore在Mysql建立不同的数据库为不同的用户建立单独的conf目录,如用户test的hive conf目录位于/opt/sohuhadoop/hive/conf/test下在单独的test目录下,修改hive-default.xml文件,配置相应的db启动单独的hiveserver实例,并监听不同的端口:HIVE_PORT=10020 nohup hive –config $HIVE_HOME/conf/test –service hiveserver &在JDBC中连接自己对应的端口,如10020
九月 2nd, 2010搭建基于Eclipse的Hadoop测试环境
下载
|
| package importimportimportimport publicclass MapperTest extends MapperObject,
Text, Text, IntWritableprivatefinalstatic IntWritable one new IntWritable publicvoid map key,
Text value, contextthrows, userid value.. context.new Textuserid, new IntWritable package importimportimportimport publicclass ReducerTest extends ReducerText,
IntWritable, Text, IntWritable private IntWritable result new IntWritable publicvoid reduceText
key, IterableIntWritable values, contextthrows, int sum forIntWritabl
val values sum val. result.sum context.key,
result package importimportimportimportimportimportimportimportimportimport publicclass DriverTest publicstaticvoid main argsthrows Configuration
conf new Configuration otherArgs new GenericOptionsParserconf,
args .ifotherArgs.... Job
job new Jobconf, job.DriverTest.class job.MapperTest.class job.ReducerTest.class job.ReducerTest.class job.Text.class job.IntWritable.class
conf., true conf.,
GzipCodec.class,CompressionCodec.class FileInputFormat.job, new PathotherArgs FileOutputFormat.job, new PathotherArgs .job.true | |