![](/img/trans.png)
[英]Hadoop MapReduce function is giving an error. Streaming Command Failed
[英]Streaming Command Failed! when execute MapReduce python code in single node hadoop cluster setup on CentOS7
我已经在同一台机器上成功执行了 mapreduce java 代码。 现在我正在尝试在同一台机器上执行用 python 编写的 Mapreduce 代码。 为此,我使用了 hadoop_3.2.1 和 hadoop-streaming-3.2.1.jar。
我已经通过命令测试了代码
[dsawale@localhost ~]$ cat Desktop/sample.txt | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py | sort | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py
我发现它显示正确的输出。
但是当我尝试使用命令在 hadoop 集群上执行时
[dsawale@localhost ~]$ hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar -mapper mapper.py -reducer reducer.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -input /sample.txt -output pysamp
我得到的输出为:
packageJobJar: [PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, /tmp/hadoop-unjar6715579504628929924/] [] /tmp/streamjob3211585412475799030.jar tmpDir=null
Streaming Command Failed!
这是我的第一个 python MapReduce 程序。 你能帮我摆脱这个错误吗? 谢谢!
配置文件:mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
核心站点.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/datanode</value>
</property>
</configuration>
纱线站点.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
您传递给mapper
和reducer
参数的文件路径不正确。
尝试,
hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar \
-mapper PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py \
-reducer PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py \
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py \
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py \
-input /sample.txt \
-output pysamp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.