簡體   English   中英

Hadoop集群-運行作業之前,我是否需要在所有計算機上復制代碼?

[英]Hadoop cluster - Do I need to replicate my code over all machines before running job?

這就是讓我感到困惑的地方,當我使用單詞計數示例時,我將代碼保留在master上,讓他與slave一起做,並且運行良好

但是,當我運行我的代碼時,它在從屬設備上開始失敗,並給出諸如以下的奇怪錯誤

Traceback (most recent call last):
  File "/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201110250901_0005/attempt_201110250901_0005_m_000001_1/work/./mapper.py", line 55, in <module>
    from src.utilities import utilities
ImportError: No module named src.utilities
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:121)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)  

因為我在路徑上沒有代碼,所以我做錯什么了嗎?

謝謝

使用Hadoop Streaming,如果目標計算機上不存在代碼/依賴項,則必須使用-file標志將其復制。 確保在Hadoop流命令中指定了map / reduce文件及其依賴項。

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper myPythonScript.py \
    -reducer /bin/wc \
    -file myPythonScript.py \
    -file myDictionary.txt \

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM