简体   繁体   English

Java Hadoop-减速器的输入可以是减速器的输出吗?

[英]Java Hadoop - Can the input to a reducer be the output of a reducer?

I'm writing a map-reduce program with (currently) 3 map-reduce phases. 我正在编写一个(当前)有3个map-reduce阶段的map-reduce程序。 I need to do another reduce to the output of the 3rd phase reduce - I can use a map of identity (takes (key, value) and outputs them without changing) but I don't want to do that extra map (time and resources wise) and wish to simply pass them to a reducer. 我需要对第三阶段reduce的输出进行另一次reduce-我可以使用身份映射(获取(key, value)并在不更改的情况下输出它们),但是我不想做额外的映射(时间和资源并希望将它们简单地传递给减速器。

Is it possible? 可能吗? If so, how to I code the "jobs"? 如果是这样,我该如何编码“职位”?

I can post my whole code if it might help (maybe I'm doing something redundant/insufficient in the previous 3 phases). 我可以发布整个代码(如果有帮助的话)(也许我在前三个阶段中做过多余/不足的事情)。

Thank you for the help. 感谢您的帮助。

I don't think it will be feasible to use reduce only jobs. 我认为仅使用减少作业是不可行的。 Moreover if you want to use reducer2 on output of reducer 1 then you can make your map 2 as a unity which simply means that map2 will do not perform any operation on reducer 1 output and will let it pass to reducer 2. 此外,如果要在化简器1的输出上使用化简器2,则可以使映射2成为一个整体,这仅表示map2将不对化简器1的输出执行任何操作,并将其传递给化简器2。

Major reason why reducer only jobs are not feasible is because reducer node reads data from output of map nodes that is why maps are required. 只使用reducer的作业不可行的主要原因是因为reducer节点从map节点的输出读取数据,这就是为什么需要map的原因。 I will suggest you to visit this page this will clear your concept of how map reduce jobs works ( www.javacrunch.in/MR.jsp ). 我建议您访问此页面,这将使您清楚了解地图减少作业的工作原理(www.javacrunch.in/MR.jsp)。

Hope this solve your query 希望这可以解决您的查询

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM