簡體   English   中英

Hadoop流命令失敗作業未成功

[英]Hadoop Streaming Command Failed Job not successful

我是將Hadoop流與Python結合使用的新手。 我能夠成功運行大多數參考文獻中解釋的wordcount示例。 但是,當我開始使用自己編寫的小型python腳本之一時,即使代碼的功能幾乎沒有,它也會顯示錯誤。

執行命令的錯誤部分是:

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
14/12/13 01:47:31 INFO mapred.LocalJobRunner: map task executor complete.
14/12/13 01:47:31 WARN mapred.LocalJobRunner: job_local174189774_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
14/12/13 01:47:32 INFO mapreduce.Job: Job job_local174189774_0001 failed with state FAILED due to: NA
14/12/13 01:47:32 INFO mapreduce.Job: Counters: 0
14/12/13 01:47:32 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!

map.py文件如下:

    import sys
    for line in sys.stdin:
        line = line.strip()
        review_lines = line.split('\n')
        for r in review_lines:
            review = r.split('\t')
            print '%s\t%s' % (review[0], review[1])

red.py文件如下:

import sys
for line in sys.stdin:
    line = line.strip()
    word = line.split('\t')
    print '%s\t%d' %(word[0], int(word[1]) % 2)

我提供的輸入是:(input_file.txt)

R1      1
R2      5
R3      3
R4      2

用於運行該過程的命令是:

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar -file /home/hduser/map.py -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py -input /user/hduser/input_file.txt -output /user/hduser/output_file.txt

您可以嘗試將其放在腳本的頂部嗎?

 #!/usr/bin/env python
#!/usr/bin/python

為我工作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM