用户工具


准备工作

对配置文件做一些修改

core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://mycluster</value>   //NameNode为集群名称
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
<property>
        <name>ha.zookeeper.quorum</name>
        <value>master:2181,slave1:2181,slave2:2181</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:///usr/local/hadoop-2.6.0/dfs/name</value>            //NameNode节点目录 ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置
        </property>
         <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:///usr/local/hadoop-2.6.0/dfs/data</value>             //DataNode节点目录  ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置,如果有多个硬盘最好配置多个目录,增加读写速度
        </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.hosts.exclude</name>
    <value>/usr/local/hadoop-2.6.0/exclude</value>
  </property>





<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
  <description>
    Comma-separated list of nameservices.
  </description>
</property>

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
  <description>
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>master:8020</value>
  <description>
    RPC address for nomenode1 of mycluster
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>slave1:8020</value>
  <description>
    RPC address for nomenode2 of mycluster
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>master:50070</value>
  <description>
    The address and the base port where the dfs namenode1 web ui will listen on.
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>slave1:50070</value>
  <description>
    The address and the base port where the dfs namenode2 web ui will listen on.
  </description>
</property>


<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
  <description>A directory on shared storage between the multiple namenodes
  in an HA cluster. This directory will be written by the active and read
  by the standby in order to keep the namespaces synchronized. This directory
  does not need to be listed in dfs.namenode.edits.dir above. It should be
  left empty in a non-HA cluster.
  </description>
</property>


<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
  <description>
    Whether automatic failover is enabled. See the HDFS High
    Availability documentation for details on automatic HA
    configuration.
  </description>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/usr/local/hadoop-2.6.0/dfs/journal/</value>
</property>
<property>
 <name>dfs.ha.fencing.methods</name>
  <value>sshfence(hadoop)</value>
    <description>
  在failover期间用来隔离Active Namenode的脚本或者java 类列表。
    虽然JNS可以确保集群中只有一个Active Node写入edits,这对保护edits一致性很重要,但是在failover期间,有可能Acitive Node仍然存活,Client可能还与其保持连接提供旧的数据服务,我们可以通过此配置,指定shell脚本或者java程序,SSH到Active NameNode然后Kill Namenode进程。它有两种可选值(具体参见官方文档):
    1) sshfence:SSH登录到Active Namenode,并Kill此进程。首先当前机器能够使用SSH登录到远端,前提是已经授权(rsa)。
    2) shell:运行shell指令隔离Active Namenode。  </description>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>  
 <name>dfs.client.failover.proxy.provider.mycluster</name>       
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>  
</property> 
</configuration>

yarn-site.xml

<configuration>
 
<!-- Site specific YARN configuration properties -->
 
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>master</value>
        </property>
</configuration>

mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name> 
                <value>yarn</value>                                       //mapreduce的frmawork指定为yarn
                <description>Execution framework set to Hadoop YARN.</description>
        </property>
<property>
                <name>mapreduce.task.io.sort.mb</name>
                <value>2000</value>                                       //mapreduce的frmawork指定为yarn
                <description>Execution framework set to Hadoop YARN.</description>
        </property>
</configuration>

更多配置,配置含义参见hadoop HA

启动

  1. 首先把各个zookeeper起来
    1. ./bin/zkServer.sh start (每台机器)
  2. 然后在某一个namenode节点执行如下命令,创建命名空间
    1. ./bin/hdfs zkfc -formatZK (主namenode)
  3. 在每个节点用如下命令启日志程序
    1. ./sbin/hadoop-daemon.sh start journalnode (每台机器)
  4. 在主namenode节点用./bin/hadoopnamenode -format格式化namenode和journalnode目录
    1. ./bin/hadoop namenode -format mycluster (主namenode)
  5. 在主namenode节点启动./sbin/hadoop-daemon.shstart namenode进程
    1. ./sbin/hadoop-daemon.sh start namenode (主namenode)
  6. 在备节点执行第一行命令,这个是把备namenode节点的目录格式化并把元数据从主namenode节点copy过来,并且这个命令不会把journalnode目录再格式化了!然后用第二个命令启动备namenode进程!
    1. ./bin/hdfs namenode –bootstrapStandby (备namenode)
    2. ./sbin/hadoop-daemon.sh start namenode (备namenode)
  7. 在两个namenode节点都执行以下命令
    1. ./sbin/hadoop-daemon.sh start zkfc (所有namenode)
  8. 7、在所有datanode节点都执行以下命令启动datanode
    1. ./sbin/hadoop-daemon.sh start datanode (所有datanode)
  • 启动zookeeper
  • ./sbin/start-dfs.sh

测试HDFS的HA功能

在任意一台namenode机器上通过jps命令查找到namenode的进程号,然后通过kill -9的方式杀掉进程,观察另一个namenode节点是否会从状态standby变成active状态

jps 查看namenode的pid
kill -9 xxx

然后查看:
http://master:50070/
http://slave1:50070/