简体繁体 English

Java Hadoop-减速器的输入可以是减速器的输出吗？

[英]Java Hadoop - Can the input to a reducer be the output of a reducer?

原文 2017-01-17 22:25:29 0 1 java/ hadoop/ mapreduce

I'm writing a map-reduce program with (currently) 3 map-reduce phases. 我正在编写一个（当前）有3个map-reduce阶段的map-reduce程序。 I need to do another reduce to the output of the 3rd phase reduce - I can use a map of identity (takes (key, value) and outputs them without changing) but I don't want to do that extra map (time and resources wise) and wish to simply pass them to a reducer. 我需要对第三阶段reduce的输出进行另一次reduce-我可以使用身份映射（获取(key, value)并在不更改的情况下输出它们），但是我不想做额外的映射（时间和资源并希望将它们简单地传递给减速器。

Is it possible? 可能吗？ If so, how to I code the "jobs"? 如果是这样，我该如何编码“职位”？

I can post my whole code if it might help (maybe I'm doing something redundant/insufficient in the previous 3 phases). 我可以发布整个代码（如果有帮助的话）（也许我在前三个阶段中做过多余/不足的事情）。

Thank you for the help. 感谢您的帮助。

1 个解决方案

I don't think it will be feasible to use reduce only jobs. 我认为仅使用减少作业是不可行的。 Moreover if you want to use reducer2 on output of reducer 1 then you can make your map 2 as a unity which simply means that map2 will do not perform any operation on reducer 1 output and will let it pass to reducer 2. 此外，如果要在化简器1的输出上使用化简器2，则可以使映射2成为一个整体，这仅表示map2将不对化简器1的输出执行任何操作，并将其传递给化简器2。

Major reason why reducer only jobs are not feasible is because reducer node reads data from output of map nodes that is why maps are required. 只使用reducer的作业不可行的主要原因是因为reducer节点从map节点的输出读取数据，这就是为什么需要map的原因。 I will suggest you to visit this page this will clear your concept of how map reduce jobs works ( www.javacrunch.in/MR.jsp ). 我建议您访问此页面，这将使您清楚了解地图减少作业的工作原理（www.javacrunch.in/MR.jsp）。

Hope this solve your query 希望这可以解决您的查询