简体   繁体   English

有关调试Hadoop MapReduce作业中的辅助排序问题的任何技巧?

[英]Any tips on debugging problems with the secondary sort in Hadoop MapReduce job?

I believe (believed?) I understand how secondary sort works in Hadoop. 我相信(相信吗?)我了解二级排序在Hadoop中的工作方式。 I created an intermediate key consisting of 4 fields. 我创建了一个由4个字段组成的中间键。 I partition by the first field, group by the first and second, and sort by all 4. 我按第一个字段分区,按第一个和第二个字段分组,然后按全部4排序。

It looks like I nailed grouping and partitioning down, but the values come into reducer out of order. 看起来我钉住了分组和分区,但是这些值进入了reducer的混乱状态。

Any ideas on how to approach debugging of this? 关于如何进行此调试的任何想法?

At the moment, it appears that static code review either manually or using tools works good. 目前看来,手动或使用工具进行静态代码审查都很好。 I believe I broke the rule: when overriding compareTo() , don't forget to override equals() and hashCode() . 我相信我违反了规则:覆盖compareTo() ,不要忘记覆盖equals()hashCode() I'll keep everyone posted if fixing this solved the problem. 如果解决此问题,我会通知所有人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM