简体   繁体   中英

in hadoop, how does reduce task pull data from map task

I understand that reducer pulls map output through http. But since each map task mergers all its spills to one file, how can a reduce task pull those intermediate data from map task? Just a piece of that file?

The output of map tasks are sorted by partition number. Each partition number corresponds to one reducer. When aa reducer pulls the output, the file pointer will be offset to the starting position of the partition number for the reducer and start reading. Of course, some partition number to file offset table is maintained on the mapper side to achieve this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM