用户工具


安装环境

虚拟机:

  • 192.168.0.2 master
  • 192.168.0.3 slave1
  • 192.168.0.4 slave2
  • jave版本:“1.7.0_80” (oracle下载)
  • hadoop版本:2.6 (apache下载)

开始安装

  1. 创建一个新用户:
    useradd hadoop
    给hadoop添加sudo
    vim /etc/sudoers
    hadoop  ALL=(ALL)       NOPASSWD: ALL
    
    sudo chown hadoop:hadoop -R /usr/local

  2. 下载jdk解压到/usr/loacl
  3. 下载hadoop解压到/usr/loacl
  4. 设置环境变量
    1. export JAVA_HOME=/usr/local/jdk1.7.0_80
      export HADOOP_HOME=/usr/local/hadoop-2.6.0
      export PATH=$JAVA_HOME/bin/:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

  5. 无密码登录
    1. ssh-keygen  -t rsa
      cat /home/hadoop/.ssh/id_rsa.pub >>/home/hadoop/.ssh/authorized_keys
      
      避免首次询问:Are you sure you want to continue connecting (yes/no)?
      vim /etc/ssh/ssh_config
      把StrictHostKeyChecking ask 改成 StrictHostKeyChecking no

  6. 设置hostname
    1. cat /etc/sysconfig/network
      NETWORKING=yes
      HOSTNAME=master

  7. 设置本地路由,添加如下记录
    1. vim /etc/hosts
      192.168.0.2 master
      192.168.0.3 slave1
      192.168.0.4 slave2

  8. 禁用selinux

    vim /etc/selinux/config 
    SELINUX=disabled

  9. 关闭防火墙

    service iptables stop
    chkconfig iptables off

配置master主机

<configuration>
        <property>
                <name>fs.defaultFS</name> 
                <value>hdfs://master:9000</value> //NameNode为集群名称 
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
</configuration>
<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:///usr/local/hadoop_data/dfs/name</value> //NameNode节点目录 ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置
        </property>
         <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:///usr/local/hadoop_data/dfs/data</value> //DataNode节点目录  ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置
        </property>


</configuration>
<configuration>
 
<!-- Site specific YARN configuration properties -->
 
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>master</value>
        </property>
</configuration>
<configuration>
        <property>
                <name>mapreduce.framework.name</name> 
                <value>yarn</value>  //mapreduce的frmawork指定为yarn
                <description>Execution framework set to Hadoop YARN.</description>
        </property>
</configuration>
slave1
slave2

配置slave主机

  • 配置和master大致一样,不同点如下
    1. 主机名不同
    2. ip不同

如果是用虚拟机搭建测试环境的话,配置号master然后克隆2个slave1改改就好了

启动hadoop

  • hdfs namenode -format master #master节点格式化hdfs
  • start-dfs.sh
  • start-yarn.sh
  • jps 查看启动进程
web 查看状态
http://master:8088
http://master:50070
http://master:50070

测试

hadoop jar /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
hadoop fs -rmr /input
hadoop fs -rmr /output
hadoop fs -rmr /tmp
hadoop fs - input.txt  /input/
echo "hello word" > input.txt
hadoop fs -mkdir /input
hadoop fs -put input.txt  /input/
rm -rf input.txt
# 指定编译依赖的jar包,WordCount.java中的代码百度搜索即可
javac -classpath $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar WordCount.java
jar cf wc.jar WordCount*.class
hdfs dfs -rmr /out*
hadoop jar wc.jar WordCount /input /output