简体繁体 English

Mapper组合器修补程序排序/排序

[英]Order of Mapper Combiner patitioner shuffle/sort

原文 2015-01-06 01:10:38 8 1 hadoop

I have the below text in Definite Guide: Hadoop in pg 206. 我在第206页的《定额指南：Hadoop》中有以下内容。

Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to. 在将数据写入磁盘之前，线程首先将数据划分为与最终将要发送到这些约化器的分区。 Within each partition, the background thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort. 在每个分区中，后台线程通过键执行内存中排序，如果有组合器功能，它将在排序的输出上运行。 Running the combiner function makes for a more compact map output, so there is less data to write to local disk and to transfer to the reducer. 运行组合器功能可实现更紧凑的映射输出，因此更少的数据可写入本地磁盘并传输到reducer。

So with this understanding, Can I sort the order as Mapper, partitioner, shuffle/sort, Combiner? 因此，有了这种理解，我可以将顺序排序为Mapper，分区器，随机播放/排序，Combiner吗？

1 个解决方案

I've written a good article about this: http://0x0fff.com/hadoop-mapreduce-comprehensive-description/ In general you are right, but in particular there are much more corner cases - combiner might be omitted for some of the records, for some of them it might run many times, and it is even so that combiner might be started on reduce side before the reducer. 我为此写了一篇很好的文章：http: //0x0fff.com/hadoop-mapreduce-comprehensive-description/一般来说，您是对的，但特别是还有很多其他情况-某些情况下可能会省略合并器记录，对于其中一些记录可能会运行很多次，甚至可以使合并器在reducer之前在reduce端启动。 So you are right in general, but the things are much more complex 所以您总体上是正确的，但是事情要复杂得多

组合器中的映射器输出增加了一倍 - Mapper output doubled in combiner

Hadoop配置 - 是否受io.sort.factor和io.sort.mb影响的映射器/组合器？ - Hadoop configuration - are mapper/combiner affected by io.sort.factor and io.sort.mb?

MapReduce中的合并器和映射器合并器之间的区别？ - Difference between combiner and in-mapper combiner in mapreduce?

Hadoop组合器排序阶段 - Hadoop combiner sort phase

hadoop中用于映射器和组合器的不同上下文类型 - Different context types in hadoop for mapper and combiner

为每个映射器输出准确运行组合器的时间 - when exactly the combiner runs for each mapper output

hadoop将映射器，分区器和组合器的输出文件存储在哪里？ - where does hadoop store the output files of mapper, partitioner and combiner?

如何在MapReduce程序中使用本地聚合方法，例如映射器中的合并器？ - How to use local aggregation methods in MapReduce programs such as in-mapper combiner?

如果hadoop中有两个映射器，则仅一个映射器的组合器 - Combiner for just one mapper, in cases where there are two mappers in hadoop

shuffle阶段和组合阶段之间有什么区别？ - What's the difference between shuffle phase and combiner phase?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 组合器中的映射器输出增加了一倍 - Mapper output doubled in combiner Hadoop配置 - 是否受io.sort.factor和io.sort.mb影响的映射器/组合器？ - Hadoop configuration - are mapper/combiner affected by io.sort.factor and io.sort.mb? MapReduce中的合并器和映射器合并器之间的区别？ - Difference between combiner and in-mapper combiner in mapreduce? Hadoop组合器排序阶段 - Hadoop combiner sort phase hadoop中用于映射器和组合器的不同上下文类型 - Different context types in hadoop for mapper and combiner 为每个映射器输出准确运行组合器的时间 - when exactly the combiner runs for each mapper output hadoop将映射器，分区器和组合器的输出文件存储在哪里？ - where does hadoop store the output files of mapper, partitioner and combiner? 如何在MapReduce程序中使用本地聚合方法，例如映射器中的合并器？ - How to use local aggregation methods in MapReduce programs such as in-mapper combiner? 如果hadoop中有两个映射器，则仅一个映射器的组合器 - Combiner for just one mapper, in cases where there are two mappers in hadoop shuffle阶段和组合阶段之间有什么区别？ - What's the difference between shuffle phase and combiner phase?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM