[英]Hadoop cluster stuck hangs on Reduce > copy >
到目前為止,我已經嘗試過這里的解決方案 , 1 ,這里, 2 。 然而,雖然這些解決方案確實導致執行mapreduce任務,但看起來它們只能在名稱節點上運行,因為我得到類似於此處的輸出, 3 。
基本上,我正在使用我自己設計的mapreduce算法運行2節點集群 。 mapreduce jar 在單個節點集群上完美執行 ,這讓我認為我的hadoop多節點配置有問題 。 要設置多節點,我按照這里的教程 。
要報告出錯的地方,當我執行我的程序時(在檢查了名稱節點,任務分析器,工作分析器和Datanode正在相應的節點上運行之后),我的程序在終端中使用此行停止 :
INFO mapred.JobClient: map 100% reduce 0%
如果我查看任務的日志,我看到copy failed: attempt... from slave-node
后跟一個SocketTimeoutException
。
查看我的從屬節點 (DataNode)上的日志顯示執行在以下行停止 :
TaskTracker: attempt... 0.0% reduce > copy >
如鏈接1和鏈接2中的解決方案所示, 從etc/hosts
文件中刪除各種IP地址會導致成功執行 ,但是我最終會在我的slave-node(DataNode)日志中的鏈接4中找到項目,例如:
INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
作為一個新的hadoop用戶 ,這看起來很可疑 ,但看到這一點可能是完全正常的。 對我來說,這看起來好像指向了hosts文件中不正確的IP地址 ,並且通過刪除此IP地址,我只是暫停從屬節點上的執行 ,並繼續在namenode上處理(這不是很有利)在所有)。
總結一下:
主人:等等/主持人
127.0.0.1 localhost
127.0.1.1 joseph-Dell-System-XPS-L702X
#The following lines are for hadoop master/slave setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
奴隸:etc / hosts
127.0.0.1 localhost
127.0.1.1 joseph-Home # this line was incorrect, it was set as 7.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Master:core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Slave:core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
主人:hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Slave:hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Master:mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Slave:mapre-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
錯誤發生在etc / hosts中:
在錯誤運行期間,slave etc / hosts文件看起來像這樣:
127.0.0.1 localhost
7.0.1.1 joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
您可能已經發現,此計算機'joseph-Home'的IP地址配置錯誤。 它設置為7.0.1.1,應設置為127.0.1.1。 因此,將slave etc / hosts文件第2行更改為127.0.1.1 joseph-Home
修復了該問題,並且我的日志正常顯示在從屬節點上。
新的etc / hosts文件:
127.0.0.1 localhost
127.0.1.1 joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
經過測試的解決方案是將以下屬性添加到hadoop-env.sh和Restart All hadoop集群服務
hadoop-env.sh
export HADOOP_CLIENT_OPTS =“ - Xmx2048m $ HADOOP_CLIENT_OPTS”
我今天也遇到了這個問題。 在我的情況下的問題是群集中的一個節點的磁盤已滿,因此hadoop無法將日志文件寫入本地磁盤,因此解決此問題的可能方法是刪除本地磁盤上的一些未使用的文件。 希望能幫助到你
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.