简体   繁体   English

Hadoop二级排序复合键compareTo与Custom Sorter比较实现

[英]Hadoop Secondary Sort Composite key compareTo vs Custom Sorter compare implementations

In Hadoop Secondary sort the code in Composite has the following method to compare values, the Composite key class implements WritableComparable :- 在Hadoop Secondary排序中,Composite中的代码具有以下方法来比较值,Composite键类实现WritableComparable :-

@Override
public int compareTo(CustomKey o) {

    int result = firstName.compareTo(o.getFirstName());     
    log.debug("value is " + result);                
    if (result == 0) {
        return lastName.compareTo(o.getLastName());
    }
    return result;
}

In the custom sorter that we create to perform secondary sort which extends WritableComparator and the code goes like this :- 在我们创建的自定义排序器中,它执行扩展WritableComparator辅助排序,代码如下所示:

@Override
public int compare(WritableComparable w1, WritableComparable w2) {
    CustomKey key1 = (CustomKey) w1;
    CustomKey key2 = (CustomKey) w2;
    int value = key1.getFirstName().compareTo(key2.getFirstName());
    if (value == 0) {           
        return -key1.getLastName().compareTo(key2.getLastName());       
    }
    return value;
}

I want to know why we are comparing values twice for sorting once in CustomKey class by implementing WritableComparable and then we create one CustomSorter class again to sort the value by extending WritableComparator . 我想知道为什么我们通过实现WritableComparableCustomKey类中比较两次值以进行一次排序,然后又通过扩展WritableComparator再创建一个CustomSorter类来对值进行排序。

I am not sure where the code you have referred is taken from. 我不确定您引用的代码来自哪里。

I will try to answer it in generic way. 我将尝试以通用方式回答。

Here is the extract from the Hadoop Definitive Guide for Secondary Sorting, 这是《 Hadoop最终分类指南》的摘录,

  1. Make the key a composite of the natural key and the natural value. 使键成为自然键和自然值的组合。
  2. The Sort comparator should order by the composite key, that is, the natural key and natural value. 排序比较器应按组合键排序,即自然键和自然值。
  3. The Partitioner and Grouping comparator for the composite key should consider only the natural key for partitioning and grouping. 复合键的分区器和分组比较器应仅考虑用于分区和分组的自然键。

Grouping similar keys will be very efficient when they are sorted. 对相似的键进行排序时,对它们进行分组非常有效。 Grouping comparator is meant for this, it helps in efficiently identifying the chunks of keys that are similar. 分组比较器就是为此目的而设计的,它有助于有效地识别相似的键块。

Ex: Assume that you get following keys (composite) out from your mapper. 例:假设您从映射器中得到了以下键(复合键)。

A,1 A,1

B,2 B,2

A,2 A2

B,3 B,3

Grouping comparator will work on these and sort them like below, 分组比较器将对此进行处理,并按如下所示对其进行排序,

A,1 A,1

A,2 A2

B,2 B,2

B,3 B,3

For you to get secondary sorting to work, you need to then sort on the value part. 为了使二级排序起作用,您需要对值部分进行排序。 Thats what is being achieved by the SortingComparator. 那就是SortingComparator实现的。

Final output would be, (Provided you have a partitioner, that partitions on the key part in the composite key) 最终输出将是(假设您有一个分区程序,该分区程序可以在组合键的键部分上进行分区)

A,2 A2

A,1 A,1

B,3 B,3

B,2 B,2

ur custom sorter method will be needed only under 2 conditions : 1) the sorting process in CustomSorter class is different from that in compareTo method in your CompositeKey class 2) you want to give preference to CustomSorter class' sorting logic. 仅在以下两种情况下才需要使用我们的自定义排序器方法:1)CustomSorter类的排序过程与CompositeKey类中的compareTo方法的排序过程不同2)您要优先考虑CustomSorter类的排序逻辑。 If the above conditions are not met, your CompositeKey class will suffice for sorting. 如果不满足上述条件,则您的CompositeKey类将足以进行排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM