简体   繁体   English

Hadoop MapReduce function 给出错误。 流媒体命令失败

[英]Hadoop MapReduce function is giving an error. Streaming Command Failed

Saved mapper.py,reducer.py, count_word_data.txt files in C:\BigData\Hadoop-3.2.2 directory.将mapper.py、reducer.py、count_word_data.txt文件保存在C:\BigData\Hadoop-3.2.2目录下。

Initial commands given给出初始命令

hadoop-3.2.2\bin\>hdfs fs -mkdir input

hadoop-3.2.2\bin\>hdfs fs -copyFromLocal C:\BigData\Hadoop-3.2.2\count_word_data.txt /input

mapper.py映射器.py

#!/usr/bin/python -O

import sys

for line in sys.stdin:
    line = line.strip()
    keys = line.split()
    for key in keys:
        value = 1
        print("{}".format(key)," {} ".format(value))

reducer.py减速器.py

#!/usr/bin/python -O

import sys

last_key = None
running_total = 0

for input_line in sys.stdin:
   input_line = input_line.strip()
   this_key, value = input_line.split("\t", 1)
   value = int(value)

   if last_key == this_key:
       running_total += value
   else:
       if last_key:
           print( "%s\t%d" % (last_key, running_total) )
       running_total = value
       last_key = this_key

if last_key == this_key:
   print( "%s\t%d" % (last_key, running_total) )

To run these python files, I have used Hadoop command要运行这些 python 文件,我使用了 Hadoop 命令

Hadoop jar C:/BigData/hadoop-3.2.2/share/Hadoop/tools/lib/hadoop-streaming-3.2.2.jar -mapper “python C:/BigData/hadoop-3.2.2/mapper.py” -reducer “C:/BigData/hadoop-3.2.2/reducer.py” -input “input/count_word_data.txt” -output “input/output”

After that it is giving me this error之后它给了我这个错误

Though I have specified all the paths(Python,Hadoop,JAVA,SPARK) correctly in system environment variables.I dont know why I am facing this error.虽然我已经在系统环境变量中正确指定了所有路径(Python,Hadoop,JAVA,SPARK)。我不知道为什么我会遇到这个错误。

2022-02-10 18:33:09,065 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2022-02-10 18:33:09,065 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
2022-02-10 18:33:09,066 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2022-02-10 18:33:09,075 ERROR streaming.PipeMapRed: configuration exception
java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
        at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
        at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

The system cannot find the file specified该系统找不到指定的文件

This problem is not directly related to Hadoop.这个问题与Hadoop没有直接关系。

Either:任何一个:

  1. The error indicates that python is not installed (or on your Windows PATH ).该错误表明python未安装(或在您的 Windows PATH上)。 To fix this, make sure you can run python -V and it shows you the version that is installed.要解决此问题,请确保您可以运行python -V ,它会显示已安装的版本。
  2. It is referring to your mapper/reducer scripts, which need to be uploaded to HDFS, where the command will actually be ran.它指的是您的 mapper/reducer 脚本,需要将其上传到 HDFS,命令将实际运行的位置。 You'd do this with more options -files C:\path\to\mapper.py,C:\path\to\reducer.py .您可以使用更多选项来执行此-files C:\path\to\mapper.py,C:\path\to\reducer.py

You also do not need quotes on your input or output paths.您也不需要在输入或 output 路径上加引号。


I will also point out that PySpark is arguably better/easier way to do word-count, and it doesn't require HDFS.我还要指出 PySpark 可以说是更好/更简单的字数统计方法,它不需要 HDFS。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Hadoop流式传输命令失败 - Hadoop streaming command failed 流命令失败! 在 CentOS7 上的单节点 hadoop 集群设置中执行 MapReduce python 代码时 - Streaming Command Failed! when execute MapReduce python code in single node hadoop cluster setup on CentOS7 Hadoop上的python流mapreduce作业失败-缺少log4j? - python streaming mapreduce job on hadoop failed - missing log4j? Mapper 代码与 unix 管道一起运行,但不与 hadoop 流一起运行。 错误不适用。 流命令失败 - Mapper code runs with unix pipe but not with hadoop streaming. Error NA. Streaming Command Failed Hadoop流命令失败作业未成功 - Hadoop Streaming Command Failed Job not successful Hadoop错误:启动作业时出错,输入路径错误:文件不存在。流命令失败 - Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command Failed python中的Hadoop Streaming Job失败错误 - Hadoop Streaming Job failed error in python 运行 hadoop 流和 mapreduce 作业:PipeMapRed.waitOutputThreads():子进程失败,代码为 127 - Running a hadoop streaming and mapreduce job: PipeMapRed.waitOutputThreads() : subprocess failed with code 127 使用hadoop流命令运行时map和reduce失败 - map and reduce is getting failed when running using hadoop streaming command hadoop mapreduce流中的多个文件输出 - multiple file output in hadoop mapreduce streaming
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM