简体   繁体   中英

when exactly the combiner runs for each mapper output

when exactly the combiner runs, though you mention the combiner class in your driver code, its still up to hadoop to decide whether it should run on each mapper output. Could you please explain on what basis( is there any thump rule or equation or formula) hadoop decides this combiner job execution.

The combiner runs after the mapper and before the reducer. It runs for every mapper output. It can be seen as a part of the mapper, so the input of the reducer is actually the output of the combiners. Each mapper may consist of many map tasks, so that's maybe something that got you confused. It acts as a "mini-reducer", meaning that it groups all the values that have the same key (the output of mapper), but only for the data that has been output from the mapper, and not for all the data, unlike the reducer.

See this Yahoo Tutorial for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM