简体   繁体   中英

How to iterate MapReduce in Hadoop? (lang: python)

I wrote a MapReduce program (mapper.py and reducer.py) to deal with PageRank problem in Hadoop.

I want to iterate the MapReduce about 10 iterations. How can I take the output of the first round MapReduce to the input of the second round MapReduce?

       1                    2                           10
[mapper->reducer] -> [mapper->reducer] -> ... -> [mapper->reducer] -> final result

You can just chain the output of job1 as the input to job2.

inputdir1 -> outputdir1 -> outputdir2 ... -> outputdir9 -> outputdir10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM