简体   繁体   中英

How reduce phase works after map phase in hadoop

i am reading hadoop framework from past few weeks,but i am not able to understand one concept. May be this question is foolish,if it is so than sorry for that. My question is suppose i have to create a word count program on a file which is too long and hence it is distributed on 3 different datanodes. Now since map phase running on all three datanodes will create as a key value pair and after that merging will be performed on all the map data created by all three datanodes. But now i am unable to understand what is next phase. Means how merge data will be distributed along different reduced phase, and how many reduce phase will run and how many datanodes will run reduce phase.Please clear my all above confusions,because of this i am unable to move further in hadoop. Sorry for a foolish question if it is so. Thank you

  1. Each of the map tasks after processing its share of the input sorts and merges the data, based on the compateTo() method implementation of the map out key class instance. (for example there were tree different groups produced A, B and C).
  2. When the processing reaches determined phase, each of the reduce tasks, based on the intermediary data produced by the map tasks, transfers only the files which it is interested in (considering that it is only interested in the group A at the moment, it will transfer only the files which belong to the group A from all the machines which which actually produced these category files).
  3. The reducer performs its own sorting and merging for the aggregated data previously transferred from the machines which were executing the map tasks (ie you have files A.1, A.2 and A.3, but since each of the map tasks was independent the sorting order the aggregated data is not guaranteed, so the sorting now is applied on the aggregated group of files)
  4. The reduce task then performs required processing and writes the results to the final location.
  5. The operation is repeated for each of the result groups.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM