How to iterate MapReduce in Hadoop? (lang: python)

Question

I wrote a MapReduce program (mapper.py and reducer.py) to deal with PageRank problem in Hadoop.

I want to iterate the MapReduce about 10 iterations. How can I take the output of the first round MapReduce to the input of the second round MapReduce?

       1                    2                           10
[mapper->reducer] -> [mapper->reducer] -> ... -> [mapper->reducer] -> final result

Answer 1

You can just chain the output of job1 as the input to job2.

inputdir1 -> outputdir1 -> outputdir2 ... -> outputdir9 -> outputdir10

How to iterate MapReduce in Hadoop? (lang: python)

Question

1 answers

solution1
0 ACCPTED 2017-04-13 20:42:57

How to iterate MapReduce in Hadoop? (lang: python)

Question

1 answers

solution1 0 ACCPTED 2017-04-13 20:42:57

solution1
0 ACCPTED 2017-04-13 20:42:57