[英]Hadoop MapReduce Wordcount python execution error
我正在嘗試執行python MapReduce wordcount程序
我從用python編寫Hadoop MapReduce程序中獲取它只是為了試圖了解它的工作原理,但問題始終是Job不能成功!
我在Cloudera VM
使用此庫執行mapper.py
和reducer.py
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
執行命令:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce, tasks=1
-file wordcount/mapper.py
-mapper mapper.py -file wordcount/reducer.py
-reducer reducer.py
-input myinput/test.txt
-output output
問題出在文件mapper.py和reducer.py的路徑必須來自本地
但輸入文件必須來自hdfs路徑
首先,必須使用以下命令在本地測試python代碼
cat <input file> | python <path from>/mapper.py | python <path from local>/reducer.py
然后在hdfs
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce,tasks=1 -file <path of local>/mapper.py
-mapper "python <path from local>/mapper.py"
-file <path from local>/reducer.py -
reducer "python <path of local>/reducer.py"
-input <path from hdfs>/myinput/test.txt
-output output
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.