简体   繁体   English

如何在 Windows 10 中的 cmd shell 上的笔记本电脑上本地运行 mapreduce 程序

[英]How to Run mapreduce Program locally on laptop on cmd shell in windows 10

I am trying to run locally MapReduce program on a laptop install hadoop 2.8 version.我正在尝试在安装了 hadoop 2.8 版本的笔记本电脑上运行本地 MapReduce 程序。 i am confused how to use that below command in Cmd shell.我很困惑如何在 Cmd shell 中使用下面的命令。

This is my command and also share the mapper and reducer code.这是我的命令,也共享映射器和减速器代码。 and my data in CSV file.和我在 CSV 文件中的数据。

D:\hadoop\bin\hadoop jar D:\hadoop\share\hadoop\tools\lib\hadoop-streaming-2.3.0.jar 
-D mapred.reduce.tasks=0
-file /reducer.py -mapper "mapper.py" 
-input /data2.csv -input /data2.csv 
-output /output
#!/usr/bin/python3
#mapper.py
import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
    line = line.strip()
    line = line.split(",")

    if len(line) >=2:
        sex = line[1]
        age = line[2]
        print ('%s\t%s' % (sex, age))
#!/usr/bin/python3
#Reducer.py
import sys

sex_age = {}

#Partitoner
for line in sys.stdin:
    line = line.strip()
    sex, age = line.split('\t')

    if sex in sex_age:
        sex_age[sex].append(int(age))
    else:
        sex_age[sex] = []
        sex_age[sex].append(int(age))

#Reducer
for sex in sex_age.keys():
    ave_age = sum(sex_age[sex])*1.0 / len(sex_age[sex])
    print ('%s\t%s'% (sex, ave_age))

That command should work the same in any Hadoop environment.该命令在任何 Hadoop 环境中的工作方式都相同。

FWIW, you should probably switch to using at least Pyspark FWIW,你应该至少改用 Pyspark

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM