hadoop增加新节点实践

jopen 10年前

之前已经有了namenode和datanode1,现在要新增节点datanode2

 

第一步:修改将要增加节点的主机名

hadoop@datanode1:~$ vim /etc/hostname

datanode2

 

第二步:修改host文件

hadoop@datanode1:~$ vim /etc/hosts
192.168.8.4     datanode2
127.0.0.1       localhost
127.0.1.1       ubuntu
192.168.8.2     namenode
192.168.8.3     datanode1
192.168.8.4     datanode2(增加了这个)

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

 

第三步:修改ip

 

第四步:重启


第五步:ssh免密码配置

1.生成密钥

hadoop@datanode2:~$ ssh-keygen -t rsa -P ""  
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
34:45:84:85:6e:f3:9e:7a:c0:f1:a4:ef:bf:30:a6:74 hadoop@datanode2
The key's randomart image is:
+--[ RSA 2048]----+
|         *=      |
|        o.       |
|       .o        |
|       .=..      |
|       oSB       |
|        + o      |
|        .+E.     |
|       . +=o     |
|        o+..o.   |
+-----------------+

2.把公钥传给namenode

hadoop@datanode2:~$ cd ~/.ssh  
hadoop@datanode2:~/.ssh$ ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
hadoop@datanode2:~/.ssh$ scp ./id_rsa.pub hadoop@namenode:/home/hadoop
hadoop@namenode's password: 
id_rsa.pub                                    100%  398     0.4KB/s   00:00   

 

3.把公钥追加到authorized_keys 

hadoop@namenode:~/.ssh$ cat ../id_rsa.pub >> authorized_keys   hadoop@namenode:~/.ssh$ cat authorized_keys   ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuOOD8R7OfNSUhGPZhQWCfC0yTeM6+txWSo3LiJjEWZbH512ymKIEiNRjCzTiRjLEqWGadAPVbip3jLuOHFpk89v7D6q8QH4ilBjLtsaVxmhb77w3yGrXlHJ8+g3QtS8VmjGEyZ86oeM5F9UM8F8QmK9mxXOWhqt3xvufetr7o7acV3APEHH1hvvkFImim2sT/iNi/Nxsch176byUS6y86gOTgznVH8OIx8MDmdKSLjqWPSCTrpvXPESlZvpLm4YSN2cYoKaxcedaynzOhXgAC0GLdq1k07eFmerUwpBT+xTzTRJPquYawK+MPf6+lnLm89u+bewdBZLdunCKhbCK3 hadoop@ubuntu3  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCssQnDzo5uhPn93bVqj+nEpzgQBipc1WgasOeFQV7ljyNlFHhOPVS6G3oHpvSrbjg3aK1MqxmCw0VokuuO5eoHwqh0alQw46eEmunzrnwuhhFpAU9V4t7LJ5pYuxZOioXbsJKxCetOY6G2lKRmyk2Z/MIMpPW+UFebt150+oYXcKKYSBBJoLmThH3bWW2CesAokIe8gCQ3rIYsHfA8rNuwxEnrL8fC2XlWODTahjHD5bymBO4rd3uiJxuTv7/r243t0hrimjhJ7uUIyPcIRYDchPmmO9DFVEBtYloLmqQQs/ZOxDiX7GF+YK7KC7Ayo1kL8VuwP90dqIhpaJmP96zV hadoop@ubuntu2  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbeTMrOtMZ8gurJyzoSVFpJbtXzUYDElXJcfm0O+FRpigxoIePPHiQc5vi7kabnLSiEv+94YDMclxZpXFjR0TXz6IJOVdPxFPqovY+GzrYVXEXj3HhbBWKC4sFUvGFGSZr8rM3R5OE2wYIZzOKdX9c6Ak5uIE7BUSuXzaiFctYXIvu37TObYZ44vDQGv9/mPsqP4Qnyx4czTLD1VmOeUHA5iQTKLt4K0HNE3i+a3mEEBMxBwETUI/6dcmvTxjEe7cy48YPadr5UT0/xgTub/OdmkBfvfT6fPDVlHtRP5jQiFapFyzL/BXiObqkSlrJbLKWTczS8J6SfsKWsSZfOPzL hadoop@datanode2

 

4.把公钥传给其节点

hadoop@namenode:~$ scp ./.ssh/authorized_keys hadoop@datanode1:/home/hadoop/.ssh/authorized_keys 
authorized_keys                                                                             100% 1190     1.2KB/s   00:00    
hadoop@namenode:~$ scp ./.ssh/authorized_keys hadoop@datanode2:/home/hadoop/.ssh/authorized_keys 
authorized_keys                                                                             100% 1190     1.2KB/s   00:00

 

5.一个错误

  1. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  
  2. @         WARNING: UNPROTECTED PRIVATE KEY FILE!          @  
  3. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  
  4. Permissions 0644 for '/home/jiangqixiang/.ssh/id_dsa' are too open.  
  5. It is recommended that your private key files are NOT accessible by others.  
  6. This private key will be ignored.  
  7. bad permissions: ignore key: /home/youraccount/.ssh/id_dsa  
解决方法:
  1. chmod 700 id_rsa  

 

第六步:修改namenode的配置文件

hadoop@namenode:~$ cd hadoop-1.2.1/conf
hadoop@namenode:~/hadoop-1.2.1/conf$ vim slaves 
datanode1
datanode2

 

第七步:负载均衡

hadoop@namenode:~/hadoop-1.2.1/conf$ start-balancer.sh
Warning: $HADOOP_HOME is deprecated.


starting balancer, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-balancer-namenode.out

以下摘自其他博客

1)如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低Map Reduce的工作效率 
2)threshold是平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长 
/app/hadoop/bin/start-balancer.sh -threshold 0.1
3)在namenode的配置文件 hdfs-site.xml 可以加上balance的带宽(默认值就是1M):
<property>
  <name>dfs.balance.bandwidthPerSec</name>  
  <value>1048576</value>  
  <description>  
    Specifies the maximum amount of bandwidth that each datanode   
    can utilize for the balancing purpose in term of   
    the number of bytes per second.   
  </description> 
</property>


第八步:测试是否有效

 

1.启动hadoop

hadoop@namenode:~/hadoop-1.2.1$ start-all.sh
Warning: $HADOOP_HOME is deprecated.


starting namenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-namenode.out
datanode2: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-datanode2.out
datanode1: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-datanode1.out
namenode: starting secondarynamenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-namenode.out
starting jobtracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-namenode.out
datanode2: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-datanode2.out
datanode1: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-datanode1.out
hadoop@namenode:~/hadoop-1.2.1$ 

 

2.错误

运行wordcount程序时出现错误

hadoop@namenode:~/hadoop-1.2.1$ hadoop jar hadoop-examples-1.2.1.jar wordcount in out
Warning: $HADOOP_HOME is deprecated.


14/09/12 08:40:39 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.SafeModeException: JobTracker is in safe mode
        at org.apache.hadoop.mapred.JobTracker.checkSafeMode(JobTracker.java:5188)
        at org.apache.hadoop.mapred.JobTracker.getStagingAreaDir(JobTracker.java:3677)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)


org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.SafeModeException: JobTracker is in safe mode
        at org.apache.hadoop.mapred.JobTracker.checkSafeMode(JobTracker.java:5188)
        at org.apache.hadoop.mapred.JobTracker.getStagingAreaDir(JobTracker.java:3677)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)


        at org.apache.hadoop.ipc.Client.call(Client.java:1113)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at org.apache.hadoop.mapred.$Proxy2.getStagingAreaDir(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
        at org.apache.hadoop.mapred.$Proxy2.getStagingAreaDir(Unknown Source)
        at org.apache.hadoop.mapred.JobClient.getStagingAreaDir(JobClient.java:1309)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:102)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:82)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

解决方法:

hadoop@namenode:~/hadoop-1.2.1$ hadoop dfsadmin -safemode leave
Warning: $HADOOP_HOME is deprecated.


Safe mode is OFF

 

3.再次测试

hadoop@namenode:~/hadoop-1.2.1$ hadoop jar hadoop-examples-1.2.1.jar wordcount in out
Warning: $HADOOP_HOME is deprecated.


14/09/12 08:48:26 INFO input.FileInputFormat: Total input paths to process : 2
14/09/12 08:48:26 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/09/12 08:48:26 WARN snappy.LoadSnappy: Snappy native library not loaded
14/09/12 08:48:28 INFO mapred.JobClient: Running job: job_201409120827_0003
14/09/12 08:48:29 INFO mapred.JobClient:  map 0% reduce 0%
14/09/12 08:48:47 INFO mapred.JobClient:  map 50% reduce 0%
14/09/12 08:48:48 INFO mapred.JobClient:  map 100% reduce 0%
14/09/12 08:48:57 INFO mapred.JobClient:  map 100% reduce 33%
14/09/12 08:48:59 INFO mapred.JobClient:  map 100% reduce 100%
14/09/12 08:49:02 INFO mapred.JobClient: Job complete: job_201409120827_0003
14/09/12 08:49:02 INFO mapred.JobClient: Counters: 30
14/09/12 08:49:02 INFO mapred.JobClient:   Job Counters 
14/09/12 08:49:02 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/12 08:49:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=27285
14/09/12 08:49:02 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/12 08:49:02 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/12 08:49:02 INFO mapred.JobClient:     Rack-local map tasks=1
14/09/12 08:49:02 INFO mapred.JobClient:     Launched map tasks=2
14/09/12 08:49:02 INFO mapred.JobClient:     Data-local map tasks=1
14/09/12 08:49:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12080
14/09/12 08:49:02 INFO mapred.JobClient:   File Output Format Counters 
14/09/12 08:49:02 INFO mapred.JobClient:     Bytes Written=48
14/09/12 08:49:02 INFO mapred.JobClient:   FileSystemCounters
14/09/12 08:49:02 INFO mapred.JobClient:     FILE_BYTES_READ=104
14/09/12 08:49:02 INFO mapred.JobClient:     HDFS_BYTES_READ=265
14/09/12 08:49:02 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=177680
14/09/12 08:49:02 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=48
14/09/12 08:49:02 INFO mapred.JobClient:   File Input Format Counters 
14/09/12 08:49:02 INFO mapred.JobClient:     Bytes Read=45
14/09/12 08:49:02 INFO mapred.JobClient:   Map-Reduce Framework
14/09/12 08:49:02 INFO mapred.JobClient:     Map output materialized bytes=110
14/09/12 08:49:02 INFO mapred.JobClient:     Map input records=2
14/09/12 08:49:02 INFO mapred.JobClient:     Reduce shuffle bytes=110
14/09/12 08:49:02 INFO mapred.JobClient:     Spilled Records=18
14/09/12 08:49:02 INFO mapred.JobClient:     Map output bytes=80
14/09/12 08:49:02 INFO mapred.JobClient:     Total committed heap usage (bytes)=248127488
14/09/12 08:49:02 INFO mapred.JobClient:     CPU time spent (ms)=8560
14/09/12 08:49:02 INFO mapred.JobClient:     Combine input records=9
14/09/12 08:49:02 INFO mapred.JobClient:     SPLIT_RAW_BYTES=220
14/09/12 08:49:02 INFO mapred.JobClient:     Reduce input records=9
14/09/12 08:49:02 INFO mapred.JobClient:     Reduce input groups=7
14/09/12 08:49:02 INFO mapred.JobClient:     Combine output records=9
14/09/12 08:49:02 INFO mapred.JobClient:     Physical memory (bytes) snapshot=322252800
14/09/12 08:49:02 INFO mapred.JobClient:     Reduce output records=7
14/09/12 08:49:02 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1042149376
14/09/12 08:49:02 INFO mapred.JobClient:     Map output records=9

 

 

hadoop@namenode:~/hadoop-1.2.1$ hadoop fs -cat out/*
Warning: $HADOOP_HOME is deprecated.


heheh   1
hello   2
it's    1
ll      1
the     2
think   1
why     1
cat: File does not exist: /user/hadoop/out/_logs

来自:http://blog.csdn.net/u010414066/article/details/39190693