Hadoop Streaming Job-python停留在map 0％时减少了CDH4.5中的0％

Question

i am using a hadoop streaming job in cloudera distribution 4.5 , but it does not advance beyond the map 0% stage, also I am not sure where are the logs that I can check, pardon my naivety in hadoop. 我在cloudera distribution 4.5中使用了hadoop流作业，但是它没有超出map 0％阶段，我也不知道我可以检查的日志在哪里，请原谅我在hadoop中的天真。

[amgen@sa-dpoc10 code]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-      mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.5.0.jar -mapper  /home/amgen/Amgen_UC1/code/mapper.py -file  /home/amgen/Amgen_UC1/code/mapper.py -reducer /home/amgen/Amgen_UC1/code/reducer.py -file /home/amgen/Amgen_UC1/code/reducer.py  -input /user/amgen/Amgen_UC1/input/Corpus_VoiceBase.txt -output /user/amgen/Amgen_UC1/output_t1
packageJobJar: [/home/amgen/Amgen_UC1/code/mapper.py,/home/amgen/Amgen_UC1/code/reducer.py, /tmp/hadoop-amgen/hadoop-unjar665443284079561966/] [] /tmp/streamjob722830427268220086.jar tmpDir=null
14/02/02 07:16:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/02 07:16:53 INFO mapred.FileInputFormat: Total input paths to process : 1
14/02/02 07:16:53 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop amgen/mapred/local]
14/02/02 07:16:53 INFO streaming.StreamJob: Running job: job_201401231022_0068
14/02/02 07:16:53 INFO streaming.StreamJob: To kill this job, run:
14/02/02 07:16:53 INFO streaming.StreamJob: UNDEF/bin/hadoop job  -Dmapred.job.tracker=sa-dpoc16.zs.local:8021 -kill job_201401231022_0068
14/02/02 07:16:53 INFO streaming.StreamJob: Tracking URL: http://sa-dpoc16.zs.local:50030/jobdetails.jsp?jobid=job_201401231022_0068
14/02/02 07:16:54 INFO streaming.StreamJob:  map 0%  reduce 0%

Please let me know if you want any configuration file. 如果您需要任何配置文件，请告诉我。

Answer 1

You can check the namenode logs through namenode UI 您可以通过namenode UI检查namenode日志

http://yourdomain.com:50070/dfshealth.jsp http://yourdomain.com:50070/dfshealth.jsp

There you can find the hyperlink for the namenodelogs that will open list of logs and xmls. 在这里，您可以找到namenodelogs的超链接，该超链接将打开日志和xml列表。 Usually the jobs logs are under userlogs folder 通常，作业日志位于userlogs文件夹下

You can also track the jobs using job tracker UI 您还可以使用作业跟踪器用户界面跟踪作业

http://yourdomain.com:50030/jobtracker.jsp http://yourdomain.com:50030/jobtracker.jsp

Answer 2

The job output above includes a link to the job details 上面的作业输出包含指向作业详细信息的链接

You can see if the mappers are failing and view the stdout and stderr of your mappers there to see if there are any python exceptions in there. 您可以查看映射器是否失败，并在那里查看映射器的stdout和stderr，以查看其中是否存在任何python异常。

Hadoop Streaming Job-python停留在map 0％时减少了CDH4.5中的0％

问题描述

2 个解决方案

解决方案1
0 2014-02-02 20:37:52

解决方案2
0 2014-02-13 11:53:50

Hadoop Streaming Job-python停留在map 0％时减少了CDH4.5中的0％

问题描述

2 个解决方案

解决方案1 0 2014-02-02 20:37:52

解决方案2 0 2014-02-13 11:53:50

解决方案1
0 2014-02-02 20:37:52

解决方案2
0 2014-02-13 11:53:50