So far for this issue I have tried solutions from here, 1 , and here, 2 . However, while these solutions do result in the mapreduce task being carried out, it would appear they only run on the name node as I get output similar to here, 3 .
Basically, I am running a 2 node cluster with a mapreduce algorithm that I have designed myself. The mapreduce jar is executed perfectly on a single node cluster , which leads me to think that there is something wrong with my hadoop multi-node configuration . To set up multi-node, I followed the tutorial here .
To report what is going wrong, when I execute my program (after checking that namenodes, tasktrackers, jobtrackers, and Datanodes are running on the respective nodes) my program halts with this line in terminal :
INFO mapred.JobClient: map 100% reduce 0%
If I take a look at the logs for the task I see copy failed: attempt... from slave-node
followed by a SocketTimeoutException
.
Taking a look at the logs on my slave-node (DataNode) shows that the execution halts at the following line :
TaskTracker: attempt... 0.0% reduce > copy >
as the solutions in links 1 and 2 suggest, removing various ip addresses from the etc/hosts
file results in successful execution , however I end up with items such as in link 4 in my slave-node (DataNode) log , for example:
INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
This looks suspect to me , as a new hadoop user , but it may be perfectly normal to see this. To me this looks as though something was pointing to the incorrect ip address in the hosts file , and that by removing this ip address I simply halt execution on the slave-node , and processing continues on the namenode instead (which isn't really advantageous at all).
To sum up:
Master: etc/hosts
127.0.0.1 localhost
127.0.1.1 joseph-Dell-System-XPS-L702X
#The following lines are for hadoop master/slave setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Slave: etc/hosts
127.0.0.1 localhost
127.0.1.1 joseph-Home # this line was incorrect, it was set as 7.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Master: core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Slave: core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Master: hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Slave: hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Master: mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Slave: mapre-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
The error is in etc/hosts:
During the erroneous runs, the slave etc/hosts file looked like this:
127.0.0.1 localhost
7.0.1.1 joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
As you may have spotted, the ip address of this computer 'joseph-Home' was incorrectly configured. It was set to 7.0.1.1, when it should be set to 127.0.1.1. Therefore, changing the slave etc/hosts file, line 2, to 127.0.1.1 joseph-Home
fixed the issue, and my logs appear normally on the slave node.
New etc/hosts file:
127.0.0.1 localhost
127.0.1.1 joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1
#the following lines are for hadoop mutli-node cluster setup
192.168.1.87 master
192.168.1.74 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Tested solution is to add below property to hadoop-env.sh and Restart All hadoop cluster services
hadoop-env.sh
export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"
I also meet this problem today. The issue in my case is that the disk of one node in the cluster is full, so hadoop cannot write log file to local disk, so a possible solution to this problem can be deleting some unused files on the local disk. Hope it helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.