python mapreduce作业返回错误

Question

Hi I have just started using Hadoop and running my first mapreduce job. 嗨，我刚刚开始使用Hadoop并运行我的第一个mapreduce作业。 I have used python for building the map and reduce scripts and I tested it and its working fine but when I am trying to run them on hadoop.It is returning error. 我已经使用python构建了地图并简化了脚本，并且对其进行了测试并正常工作，但是当我尝试在hadoop上运行它们时，它返回了错误。

Following is the command I entered in the terminal 以下是我在终端中输入的命令

/home/maitreyee/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar     -mapper /usr/bin/python mapper1.py -reducer /usr/bin/python reducer1.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output1

And the below error appears 并且出现以下错误

Warning: $HADOOP_HOME is deprecated.
packageJobJar: [/app/hadoop/tmp/hadoop-unjar3238940252334854546/] []      /tmp/streamjob4553487258055690616.jar tmpDir=null
14/12/05 11:53:29 INFO streaming.StreamJob: Running job: job_201412050953_0004
14/12/05 11:53:29 INFO streaming.StreamJob: To kill this job, run:
14/12/05 11:53:29 INFO streaming.StreamJob: /home/maitreyee/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201412050953_0004
14/12/05 11:53:29 INFO streaming.StreamJob: Tracking URL: http://localhost:50030   /jobdetails.jsp?jobid=job_201412050953_0004
14/12/05 11:53:30 INFO streaming.StreamJob:  map 0%  reduce 0%
14/12/05 11:54:54 INFO streaming.StreamJob:  map 100%  reduce 100%
14/12/05 11:54:54 INFO streaming.StreamJob: To kill this job, run:
14/12/05 11:54:54 INFO streaming.StreamJob: /home/maitreyee/hadoop/libexec/../bin/hadoop    job  -Dmapred.job.tracker=localhost:54311 -kill job_201412050953_0004
14/12/05 11:54:54 INFO streaming.StreamJob: Tracking URL: http://localhost:50030  /jobdetails.jsp?jobid=job_201412050953_0004
14/12/05 11:54:54 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map          Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:        task_201412050953_0004_m_000000
14/12/05 11:54:54 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

Kindly suggest what is going wrong and how could it be resolved. 请提出问题所在以及如何解决。

Answer 1

It feels its just a matter of time. 感觉只是时间问题。 Before running jobs in hadoop make sure hadoop is running fine through jps, keep the system updated and check the ssh connection as well. 在hadoop中运行作业之前，请确保hadoop在jps中运行正常，请保持系统更新并检查ssh连接。 Then write the command as follows to run a simple python map reduce job in hadoop (I am using Ubuntu 12.04 LTS and Hadoop 1.2.1). 然后按如下所示编写命令以在hadoop中运行一个简单的python map reduce作业（我正在使用Ubuntu 12.04 LTS和Hadoop 1.2.1）。

 hduser@bharti-desktop:~/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -input /user/hduser/gutenberg/gutenberg/ -output /user/hduser/op4 -mapper /home/hduser/hadoop/mapper1.py -file /home/hduser/hadoop/mapper1.py -reducer /home/hduser/hadoop/reducer1.py -file /home/hduser/hadoop/reducer1.py

A small explanation of the terminal command above: Since its a streaming job so we first put the location of streaming jar file of hadoop, then the location of the input file, followed by the location of output file (try to give a unique name and should be present in hdfs), then we tell hadoop what function we want to perform, and where it will be performed (map and reduce tasks), followed by the file attribute to tell the location of scripts. 上面的终端命令的简短说明：由于它是流工作，因此我们首先放置hadoop的流jar文件的位置，然后是输入文件的位置，然后是输出文件的位置（尝试给出唯一的名称和应该出现在hdfs中），然后告诉hadoop我们要执行什么功能，以及该功能将在何处执行（映射和归约任务），其后是file属性以指示脚本的位置。 (in the case of scripting language.) （就脚本语言而言。）

If there is still any doubt kindly let me know. 如果还有任何疑问，请告诉我。

python mapreduce作业返回错误

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-12-08 05:44:01

python mapreduce作业返回错误

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-12-08 05:44:01

解决方案1
0 已采纳 2014-12-08 05:44:01