如何在 Windows 10 中的 cmd shell 上的笔记本电脑上本地运行 mapreduce 程序

Question

I am trying to run locally MapReduce program on a laptop install hadoop 2.8 version.我正在尝试在安装了 hadoop 2.8 版本的笔记本电脑上运行本地 MapReduce 程序。 i am confused how to use that below command in Cmd shell.我很困惑如何在 Cmd shell 中使用下面的命令。

This is my command and also share the mapper and reducer code.这是我的命令，也共享映射器和减速器代码。 and my data in CSV file.和我在 CSV 文件中的数据。

D:\hadoop\bin\hadoop jar D:\hadoop\share\hadoop\tools\lib\hadoop-streaming-2.3.0.jar 
-D mapred.reduce.tasks=0
-file /reducer.py -mapper "mapper.py" 
-input /data2.csv -input /data2.csv 
-output /output

#!/usr/bin/python3
#mapper.py
import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
    line = line.strip()
    line = line.split(",")

    if len(line) >=2:
        sex = line[1]
        age = line[2]
        print ('%s\t%s' % (sex, age))

#!/usr/bin/python3
#Reducer.py
import sys

sex_age = {}

#Partitoner
for line in sys.stdin:
    line = line.strip()
    sex, age = line.split('\t')

    if sex in sex_age:
        sex_age[sex].append(int(age))
    else:
        sex_age[sex] = []
        sex_age[sex].append(int(age))

#Reducer
for sex in sex_age.keys():
    ave_age = sum(sex_age[sex])*1.0 / len(sex_age[sex])
    print ('%s\t%s'% (sex, ave_age))

Answer 1

That command should work the same in any Hadoop environment.该命令在任何 Hadoop 环境中的工作方式都相同。

FWIW, you should probably switch to using at least Pyspark FWIW，你应该至少改用 Pyspark

如何在 Windows 10 中的 cmd shell 上的笔记本电脑上本地运行 mapreduce 程序

问题描述

1 个解决方案

解决方案1
0 2020-01-19 16:10:10

如何在 Windows 10 中的 cmd shell 上的笔记本电脑上本地运行 mapreduce 程序

问题描述

1 个解决方案

解决方案1 0 2020-01-19 16:10:10

解决方案1
0 2020-01-19 16:10:10