hadoop - Map reduce on multiple cluster

Question

I have configured Hadoop cluster . And im having two machines MA and MB When i run the mapreduce program using the following code

 hadoop  jar /HDP/hadoop-1.2.0.1.3.0.0-0380/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-0380.jar  -mapper "python C:\Python33\mapper.py"  -reducer "python C:\Python33\redu.py"  -input "/user/XXXX/input/input.txt"  -output "/user/XXXX/output/out20131112_09"

where : mapper - C:\\Python33\\mapper.py and reducer C:\\Python33\\redu.py is in MB's local disk

UPDATE 在此输入图像描述

Finally i have tracked down the error .

MA- error log

stderr logs
python: can't open file 'C:\Python33\mapper.py': [Errno 2] No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2

mapper - C:\\Python33\\mapper.py and reducer C:\\Python33\\redu.py is in MA's local disk and it is not in MB

Now , Do i need to copy my m/r program to MA or how shall i resolve this

Mapper

import sys
for line in sys.stdin:
   line = line.strip()
   keys = line.split()
   for key in keys:
       value = 1
       print( '%s \t %d' % (key, value))

Answer 1

If the map input file is smaller than dfs.block.size then you will end with only one task per job running. For small inputs you can force Hadoop to run multiple tasks with mapred.max.split.size value in bytes being smaller than dfs.block.size .

hadoop - Map reduce on multiple cluster

Question

1 answers

solution1
2 ACCPTED 2013-11-12 13:01:02

hadoop - Map reduce on multiple cluster

Question

1 answers

solution1 2 ACCPTED 2013-11-12 13:01:02

solution1
2 ACCPTED 2013-11-12 13:01:02