close

最近在啟動hadoop叢集時,發生了一個問題。

透過name node主機啟動叢集上所有節點hadoop程序,於data node主機上使用jps指令,可以看到DataNode與TaskTracker程序已啟動。

但透過瀏覽器鍵入http://{NameNodeIP}:50030http://{NameNodeIP}:50070檢視各節點狀態,就是不見data node節點。

因此遠端進入data node節點查看log,發現顯示下列錯誤訊息:

2012-02-02 00:00:05,690 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 0 time(s).
2012-02-02 00:00:06,691 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 1 time(s).
2012-02-02 00:00:07,692 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 2 time(s).
2012-02-02 00:00:08,692 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 3 time(s).
2012-02-02 00:00:09,693 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 4 time(s).
2012-02-02 00:00:10,693 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 5 time(s).
2012-02-02 00:00:11,694 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 6 time(s).
2012-02-02 00:00:12,695 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 7 time(s).
2012-02-02 00:00:13,695 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 8 time(s).
2012-02-02 00:00:14,696 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-c/10.0.0.54:9000. Already tried 9 time(s).
2012-02-02 00:00:14,697 INFO org.apache.hadoop.ipc.RPC: Server at hadoop-c/10.0.0.54:9000 not available yet, Zzzzz...

 

上述log所顯示name node的host name與ip都沒有錯,但就是無法與name node正常連線,上網查了一下,有兩項可能性:

1. 網路環境有防火牆阻隔
=> 排除此可能性,測試其他連線都可正常與name node連線,所以不會是防火牆問題。

2. name node binding 的 IP 與 data node嘗試連的 IP 不同。
為了測試此可能性,須在name node主機執行下列指令

  1. $ netstat -nap | grep 9000

 

執行結果理應為:

  1. tcp6 0 0 10.0.0.54:9000 :::* LISTEN 10590/java

 

但實際結果為:

  1. tcp6 0 0 127.0.0.1:9000 :::* LISTEN 10590/java


這表示某些設定有錯,須檢查兩項設定:
(1) name node的conf/core-site.xml 中,設錯設為localhost,應改為name node主機名稱或直接的ip

  1. <configuration>
  2. <property>
  3. <name>fs.default.name</name>
  4. <value>hdfs://localhost:9000</value>
  5. </property>
  6. <property>
  7. <name>hadoop.tmp.dir</name>
  8. <value>/tmp/hadoop/hadoop-${user.name}</value>
  9. </property>
  10. </configuration>


(2) name node與data node的/etc/hosts內容都必須是ip位址與host name的對應,不可出現像"127.0.0.1 hadoop-c"這樣的對應,否則設定時鍵入host name,則會以"127.0.0.1"作為ip位址

  1. 10.0.0.54 hadoop-c
  2. 10.0.0.XX datanode

 

完成以上修改後,便可透過name node啟動叢集所有節點的hadoop程序。

arrow
arrow
    全站熱搜

    ciner945 發表在 痞客邦 留言(0) 人氣()