I wrote a MapReduce program (mapper.py and reducer.py) to deal with PageRank problem in Hadoop.
I want to iterate the MapReduce about 10 iterations. How can I take the output of the first round MapReduce to the input of the second round MapReduce?
1 2 10
[mapper->reducer] -> [mapper->reducer] -> ... -> [mapper->reducer] -> final result
You can just chain the output of job1 as the input to job2.
inputdir1 -> outputdir1 -> outputdir2 ... -> outputdir9 -> outputdir10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.