简体   繁体   English

在Map only任务中是否会有Shuffle和sort?

[英]Will there be Shuffle and sort in Map only task?

Does the shuffle and sort phase come before the end of the map task or does it come after the output is generated from the map task so that there is no look back to the map task anymore. 混洗和排序阶段是在map任务结束之前进行还是在从map任务生成输出之后进行,以便不再回顾map任务。 This is a 'Map only task' case where I get confusion. 这是一个“仅限地图的任务”案例,我感到困惑。 If there is no Shuffle and sort in Map only task, can someone explain how is the data written into the final output files. 如果在Map only任务中没有Shuffle和sort,有人可以解释数据是如何写入最终输出文件的。

When you have a map-only task, there is not shuffling at all, which means that mappers will write the final output directly to the HDFS. 当你有一个只有map的任务时,根本没有洗牌,这意味着mappers会将最终输出直接写入HDFS。

On the other hand, when you have a whole Map-Reduce program, with mappers and reducers, yes, shuffling can start before reduce-phase start. 另一方面,当你有一个完整的Map-Reduce程序,使用映射器和reducer时,是的,shuffling可以在reduce-phase启动之前启动。

Quoting this very nice answer in SO: 在SO中引用这个非常好的答案

First of all shuffling is the process of transfering data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn't be able to have any input (or input from every mapper). 首先,洗牌是将数据从映射器传输到Reducer的过程,所以我认为很明显减少器是必要的,否则,它们将无法获得任何输入(或来自每个映射器的输入) )。 Shuffling can start even before the map phase has finished, to save some time. 即使在地图阶段结束之前,也可以开始改组,以节省一些时间。 That's why you can see a reduce status greater than 0% (but less than 33%) when the map status is not yet 100%. 这就是为什么当地图状态还不是100%时,你可以看到减少状态大于0%(但小于33%)。

Hope this answer had clarified your confusion. 希望这个答案澄清了你的困惑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM