简体   繁体   English

mapreduce二级排序不起作用

[英]mapreduce secondary sort doesn't work

I'm trying to do a secondary sort in mapreduce with a composite key that consisnts of: 我正在尝试使用由以下组成的复合键在mapreduce中进行二级排序:

  • String natural-key = program name 字符串自然键=程序名称

  • Long key-for-sorting = time in milli since 1970 长排序键=自1970年以来的毫秒时间

The problem is that After sorting I get lots of reducers according to the entire composite key 问题是排序后,我根据整个组合键得到了很多减速器

By debugging I have verified that the hashcode and the compare functions are correct. 通过调试,我已验证哈希码和比较函数正确。 From debug logging it where each block is from a different reducer it shows that either the grouping or the partitioning didn't succeed. 通过调试日志记录,每个块来自不同的reducer的位置表明,分组或分区均未成功。 from debug logs: 从调试日志中:

14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:03 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:03 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key the voice ended



14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=top gear
14/12/14 00:55:12 INFO popularitweet.EtanReducer: top gear: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key top gear ended



14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=american horror story
14/12/14 00:55:12 INFO popularitweet.EtanReducer: american horror story: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key american horror story ended



14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key the voice ended

As you can see the voice is sent to two different reducers but the timestamp is different. 如您所见,声音被发送到两个不同的减速器,但时间戳不同。 Any help would be appreciated. 任何帮助,将不胜感激。 The composite key is the following class: 组合键是以下类别:

public class ProgramKey implements WritableComparable<ProgramKey> {
private String program;
private Long timestamp;

public ProgramKey() {
}

public ProgramKey(String program, Long timestamp) {
    this.program = program;
    this.timestamp = timestamp;
}

@Override
public int compareTo(ProgramKey o) {
    int result = program.compareTo(o.program);
    if (result == 0) {
        result = timestamp.compareTo(o.timestamp);
    }
    return result;
}

@Override
public void write(DataOutput dataOutput) throws IOException {
    WritableUtils.writeString(dataOutput, program);
    dataOutput.writeLong(timestamp);
}

@Override
public void readFields(DataInput dataInput) throws IOException {
    program = WritableUtils.readString(dataInput);
    timestamp = dataInput.readLong();
}

My implemeted Partitioner, GroupingComparator, and SortingComparator are these: 我实现的分区程序,GroupingComparator和SortingComparator是:

public class ProgramKeyPartitioner extends Partitioner<ProgramKey, TweetObject> {

@Override
public int getPartition(ProgramKey programKey, TweetObject tweetObject, int numPartitions) {
    int hash = programKey.getProgram().hashCode();
    int partition = hash % numPartitions;
    return partition;
}

} }

public class ProgramKeyGroupingComparator extends WritableComparator {
protected ProgramKeyGroupingComparator() {
    super(ProgramKey.class, true);
}

@Override
public int compare(WritableComparable a, WritableComparable b) {
    ProgramKey k1 = (ProgramKey) a;
    ProgramKey k2 = (ProgramKey) b;
    return k1.getProgram().compareTo(k2.getProgram());
}

} }

public class TimeStampComparator extends WritableComparator {
protected TimeStampComparator() {
    super(ProgramKey.class, true);
}

@Override
public int compare(WritableComparable a, WritableComparable b) {
    ProgramKey ts1 = (ProgramKey)a;
    ProgramKey ts2 = (ProgramKey)a;

    int result = ts1.getProgram().compareTo(ts2.getProgram());
    if (result == 0) {
        result = ts1.getTimestamp().compareTo(ts2.getTimestamp());
    }
    return result;
}

} }

    public static void main(String[] args) throws IOException,
        InterruptedException, ClassNotFoundException {



    // Create configuration
    Configuration conf = new Configuration();

    // Create job
    Job job = new Job(conf, "test1");
    job.setJarByClass(EtanMapReduce.class);

    // Set partitioner keyComparator and groupComparator
    job.setPartitionerClass(ProgramKeyPartitioner.class);
    job.setGroupingComparatorClass(ProgramKeyGroupingComparator.class);
    job.setSortComparatorClass(TimeStampComparator.class);

    // Setup MapReduce
    job.setMapperClass(EtanMapper.class);
    job.setMapOutputKeyClass(ProgramKey.class);
    job.setMapOutputValueClass(TweetObject.class);
    job.setReducerClass(EtanReducer.class);

    // Specify key / value
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(TweetObject.class);

    // Input
    FileInputFormat.addInputPath(job, inputPath);
    job.setInputFormatClass(TextInputFormat.class);

    // Output
    FileOutputFormat.setOutputPath(job, outputDir);
    job.setOutputFormatClass(TextOutputFormat.class);

    // Delete output if exists
    FileSystem hdfs = FileSystem.get(conf);
    if (hdfs.exists(outputDir))
        hdfs.delete(outputDir, true);

    // Execute job
    logger.info("starting job");
    int code = job.waitForCompletion(true) ? 0 : 1;
    System.exit(code);

}    

Edit... 编辑...

your TimeStampComparator seems to have a typo... you're setting ts2 to a when it should be set to b: 您的TimeStampComparator似乎有错字...将ts2设置为a时应将其设置为b:

ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)a;

when it should be: 什么时候应该是:

ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)b;

This would result in incorrectly sorted key/value pairs and invalidates the assumption made by the grouping comparator that the key/value pairs are sorted. 这将导致键/值对的排序不正确,并使分组比较器对键/值对进行排序的假设无效。

Check also that the original program names are in UTF-8 as that's what WritableUtils assumes. 还要检查原始程序名称是否在UTF-8中,这正是WritableUtils假定的。 Is your system's default code page also UTF-8? 您系统的默认代码页也是UTF-8吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM