Hadoop MapReduce Wordcount python執行錯誤

Question

我正在嘗試執行python MapReduce wordcount程序

我從用python編寫Hadoop MapReduce程序中獲取它只是為了試圖了解它的工作原理，但問題始終是Job不能成功！

我在Cloudera VM使用此庫執行mapper.py和reducer.py

/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar

執行命令：

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce, tasks=1
-file wordcount/mapper.py 
-mapper mapper.py -file wordcount/reducer.py
-reducer reducer.py
-input myinput/test.txt
-output output

在此處輸入圖片說明

Answer 1

問題出在文件mapper.py和reducer.py的路徑必須來自本地

但輸入文件必須來自hdfs路徑

首先，必須使用以下命令在本地測試python代碼

cat <input file> | python <path from>/mapper.py | python <path from local>/reducer.py

然后在hdfs

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar 

-Dmaperd.reduce,tasks=1 -file <path of local>/mapper.py 

-mapper "python <path from local>/mapper.py" 

-file <path from local>/reducer.py -

reducer "python <path of local>/reducer.py" 

-input <path from hdfs>/myinput/test.txt 

-output output

Hadoop MapReduce Wordcount python執行錯誤

問題描述

1 個解決方案

解決方案1
2 已采納 2017-11-01 15:51:36

Hadoop MapReduce Wordcount python執行錯誤

問題描述

1 個解決方案

解決方案1 2 已采納 2017-11-01 15:51:36

解決方案1
2 已采納 2017-11-01 15:51:36