简体   繁体   中英

hadoop - Map reduce on multiple cluster

I have configured Hadoop cluster . And im having two machines MA and MB When i run the mapreduce program using the following code

 hadoop  jar /HDP/hadoop-1.2.0.1.3.0.0-0380/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-0380.jar  -mapper "python C:\Python33\mapper.py"  -reducer "python C:\Python33\redu.py"  -input "/user/XXXX/input/input.txt"  -output "/user/XXXX/output/out20131112_09"

where : mapper - C:\\Python33\\mapper.py and reducer C:\\Python33\\redu.py is in MB's local disk

UPDATE 在此输入图像描述

Finally i have tracked down the error .

MA- error log

stderr logs
python: can't open file 'C:\Python33\mapper.py': [Errno 2] No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2

mapper - C:\\Python33\\mapper.py and reducer C:\\Python33\\redu.py is in MA's local disk and it is not in MB

Now , Do i need to copy my m/r program to MA or how shall i resolve this

Mapper

import sys
for line in sys.stdin:
   line = line.strip()
   keys = line.split()
   for key in keys:
       value = 1
       print( '%s \t %d' % (key, value))

If the map input file is smaller than dfs.block.size then you will end with only one task per job running. For small inputs you can force Hadoop to run multiple tasks with mapred.max.split.size value in bytes being smaller than dfs.block.size .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM