Hadoop环境搭建(集群),hadoop环境搭建集群
Hadoop环境搭建(集群)
- 环境准备(centos7 x 3)
- Hadoop参数配置
- 启动Hadoop
环境准备
- Linux新增hadoop用户(当前用户root):
useradd -m hadoop -s /bin/bash
passwd hadoop
# 给hadoop用户增加权限(当前用户root)
visudo
hadoop ALL=(ALL) NOPASSWD:ALL # 追加配置
- 配置SSH:
# 安装SSH
sudo yum install openssh-clients
sudo yum install openssh-server
# 查看是否已安装成功
rpm -qa | grep ssh
# 创建本机SSH密钥
cd ~/.ssh/ # 若没有该目录,请先执行一次ssh localhost
ssh-keygen -t rsa # 采用rsa加密策略生成密钥,一般三回车
cat id_rsa.pub >> authorized_keys # 加入授权
chmod 600 ./authorized_keys # 修改文件权限
ssh localhost # 试用SSH登陆本机进行验证
# 集群环境SSH (非DNS)
vi /etc/hosts # 更改hosts的IP与hostname的映射
# 192.168.0.160 Master
# 192.168.0.161 Slave1
# 192.168.0.162 Slave2
# Master主机
cd ~/.ssh/ # 进入rsa公钥私钥文件存放的目录
rm -f id_rsa* # 删除目录下的id_rsa,id_rsa.pub文件
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
ssh-copy-id -i Master # 登录Slave1,将公钥拷贝到Master的authorized_keys中
ssh-copy-id -i Master # 登录Slave2,将公钥拷贝到Master的authorized_keys中
more authorized_keys # 查看生成的认证key
chmod 600 authorized_keys # 授权authorized_keys文件
# 将授权文件分配到其他主机上
scp /home/hadoop/.ssh/authorized_keys Slave1:/home/hadoop/.ssh/ #拷贝到Slave1上
scp /home/hadoop/.ssh/authorized_keys Slave2:/home/hadoop/.ssh/ #拷贝到Slave2上
- 配置Java环境:
rpm -ql java-1.7.0-openjdk-devel | grep '/bin/javac'
vi ~/.bashrc # 或者/etc/profile,全局、针对所有用户
#配置环境变量
### ------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
### ------------------------------------------
source ~/.bashrc # 使变量设置生效
echo $JAVA_HOME # 检验变量值
java -version # 查看当前JAVA版本
Hadoop安装及配置
- Hadoop安装(在Master主机上)
cat ~/download/hadoop-2.x.x.tar.gz.mds | grep 'MD5' # 列出md5检验值
md5sum ~/下载/hadoop-2.x.x.tar.gz | tr "a-z" "A-Z" # 计算md5值,并转化为大写,方便比较
sudo tar -zxf ~/下载/hadoop-2.x.x.tar.gz -C /usr # 解压到/usr中
cd /usr/
sudo mv ./hadoop-2.x.x/ ./hadoop # 将文件夹名改为hadoop
sudo chown -R hadoop:hadoop ./hadoop # 修改文件权限
cd /usr/hadoop
./bin/hadoop version
- 向conf下的slaves文件中添加所有的DataNode节点的hostname
- 修改core-site.xml,配置如下:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/tmp</value>
</property>
</configuration>
- 修改hdfs-site.xml,配置如下:
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/tmp/dfs/data</value>
</property>
</configuration>
- 修改mapred-site.xml (默认文件名为 mapred-site.xml.template),配置如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>2</value>
<final>true</final>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
<final>true</final>
</property>
</configuration>
- 修改yarn-site.xml,,配置如下:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
</configuration>
- 将Master节点上的hadoop文件打包并分发至其他节点
cd /usr
sudo rm -r ./hadoop/tmp # 删除 Hadoop 临时文件
sudo rm -r ./hadoop/logs/* # 删除日志文件
tar -zcf ~/hadoop.master.tar.gz ./hadoop # 先压缩再复制
cd ~
scp ./hadoop.master.tar.gz Slave1:/home/hadoop
scp ./hadoop.master.tar.gz Slave2:/home/hadoop
#Slave1、Slave2 节点上
sudo rm -r /usr/hadoop # 删掉旧的(如果存在)
sudo tar -zxf ~/hadoop.master.tar.gz -C /usr
sudo chown -R hadoop:hadoop /usr/hadoop
- 配置Hadoop环境变量
vi ~/.bashrc
export HADOOP_HOME=/usr/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_LIBRARY_PATH=/usr/hadoop/lib/native #use local lib
source ~/.bashrc
启动Hadoop
- 启动Hadoop集群
hdfs namenode -format # 首次启动需要先在Master节点执行NameNode的格式化
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
# Master节点上可以看到 NameNode、ResourceManager、SecondaryNameNode、JobHistoryServer进程
# Slave节点可以看到DataNode和NodeManager进程
jps
hdfs dfsadmin -report # 查看DataNode是否正常启动
***DataNode无法启动,删除所有节点(包括Slave节点)上的/usr/hadoop/tmp文件夹,重新执行一次 hdfs namenode -format
- 运行WordCount
hdfs dfs -mkdir -p /user/hadoop
hdfs dfs -mkdir input
hdfs dfs -put XXX.txt input
hadoop jar /wordcount.jar input output
- 关闭 Hadoop 集群
stop-yarn.sh
stop-dfs.sh
mr-jobhistory-daemon.sh stop historyserver
- 原文:http://blog.csdn.net/l371036075_yue/article/details/52548818
Apache Hadoop
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。