简体   繁体   English

为每个映射器输出准确运行组合器的时间

[英]when exactly the combiner runs for each mapper output

when exactly the combiner runs, though you mention the combiner class in your driver code, its still up to hadoop to decide whether it should run on each mapper output. 当确切地运行合并器时,尽管您在驱动程序代码中提到了合并器类,但仍然需要Hadoop来决定是否应在每个映射器输出上运行它。 Could you please explain on what basis( is there any thump rule or equation or formula) hadoop decides this combiner job execution. 您能否解释hadoop决定此组合器作业执行的依据(是否有任何重击规则,方程式或公式)。

The combiner runs after the mapper and before the reducer. 组合器在映射器之后和减速器之前运行。 It runs for every mapper output. 它针对每个映射器输出运行。 It can be seen as a part of the mapper, so the input of the reducer is actually the output of the combiners. 可以将其视为映射器的一部分,因此减速器的输入实际上是组合器的输出。 Each mapper may consist of many map tasks, so that's maybe something that got you confused. 每个映射器可能包含许多映射任务,所以这可能会让您感到困惑。 It acts as a "mini-reducer", meaning that it groups all the values that have the same key (the output of mapper), but only for the data that has been output from the mapper, and not for all the data, unlike the reducer. 它充当“小型化简”,这意味着它将所有具有相同键的值(映射器的输出)分组,但仅针对已从映射器输出的数据,而不是针对所有数据,与减速器。

See this Yahoo Tutorial for more details. 有关更多详细信息,请参见此Yahoo教程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM