最近在啟動hadoop叢集時,發生了一個問題。
透過name node主機啟動叢集上所有節點hadoop程序,於data node主機上使用jps指令,可以看到DataNode與TaskTracker程序已啟動。
但透過瀏覽器鍵入http://{NameNodeIP}:50030或http://{NameNodeIP}:50070檢視各節點狀態,就是不見data node節點。
因此遠端進入data node節點查看log,發現顯示下列錯誤訊息:
2012-02-02 00:00:05,690 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 0 time(s).
2012-02-02 00:00:06,691 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 1 time(s).
2012-02-02 00:00:07,692 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 2 time(s).
2012-02-02 00:00:08,692 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 3 time(s).
2012-02-02 00:00:09,693 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 4 time(s).
2012-02-02 00:00:10,693 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 5 time(s).
2012-02-02 00:00:11,694 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 6 time(s).
2012-02-02 00:00:12,695 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 7 time(s).
2012-02-02 00:00:13,695 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 8 time(s).
2012-02-02 00:00:14,696 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 9 time(s).
2012-02-02 00:00:14,697 INFO org.apache.hadoop.ipc.RPC: Server at hadoop-c/10.0.0.54:9000 not available yet, Zzzzz...
上述log所顯示name node的host name與ip都沒有錯,但就是無法與name node正常連線,上網查了一下,有兩項可能性:
1. 網路環境有防火牆阻隔
=> 排除此可能性,測試其他連線都可正常與name node連線,所以不會是防火牆問題。
2. name node binding 的 IP 與 data node嘗試連的 IP 不同。
為了測試此可能性,須在name node主機執行下列指令
- $ netstat -nap | grep 9000
執行結果理應為:
- tcp6 0 0 10.0.0.54:9000 :::* LISTEN 10590/java
但實際結果為:
- tcp6 0 0 127.0.0.1:9000 :::* LISTEN 10590/java
這表示某些設定有錯,須檢查兩項設定:
(1) name node的conf/core-site.xml 中,設錯設為localhost,應改為name node主機名稱或直接的ip
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/tmp/hadoop/hadoop-${user.name}</value>
- </property>
- </configuration>
(2) name node與data node的/etc/hosts內容都必須是ip位址與host name的對應,不可出現像"127.0.0.1 hadoop-c"這樣的對應,否則設定時鍵入host name,則會以"127.0.0.1"作為ip位址
- 10.0.0.54 hadoop-c
- 10.0.0.XX datanode
完成以上修改後,便可透過name node啟動叢集所有節點的hadoop程序。
留言列表