繁体   English   中英

流命令失败! 在 CentOS7 上的单节点 hadoop 集群设置中执行 MapReduce python 代码时

[英]Streaming Command Failed! when execute MapReduce python code in single node hadoop cluster setup on CentOS7

我已经在同一台机器上成功执行了 mapreduce java 代码。 现在我正在尝试在同一台机器上执行用 python 编写的 Mapreduce 代码。 为此,我使用了 hadoop_3.2.1 和 hadoop-streaming-3.2.1.jar。

我已经通过命令测试了代码

[dsawale@localhost ~]$ cat Desktop/sample.txt | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py | sort | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py

我发现它显示正确的输出。

但是当我尝试使用命令在 hadoop 集群上执行时

[dsawale@localhost ~]$ hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar -mapper mapper.py -reducer reducer.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -input /sample.txt -output pysamp

我得到的输出为:

packageJobJar: [PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, /tmp/hadoop-unjar6715579504628929924/] [] /tmp/streamjob3211585412475799030.jar tmpDir=null
Streaming Command Failed!

这是我的第一个 python MapReduce 程序。 你能帮我摆脱这个错误吗? 谢谢!

配置文件:mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
            <name>yarn.app.mapreduce.am.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
        <property>
            <name>mapreduce.map.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
        <property>
            <name>mapreduce.reduce.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

核心站点.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>
    <property>  
        <name>dfs.permission</name>
        <value>false</value>
    </property>
    <property>  
        <name>dfs.namenode.name.dir</name>
        <value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/datanode</value>
    </property>
</configuration>

纱线站点.xml:

    <configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

您传递给mapperreducer参数的文件路径不正确。

尝试,

hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar \
-mapper PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py \
-reducer PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py  \
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py \
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py \
-input /sample.txt \
-output pysamp

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM