[英]mapreduce secondary sort doesn't work
我正在嘗試使用由以下組成的復合鍵在mapreduce中進行二級排序:
字符串自然鍵=程序名稱
長排序鍵=自1970年以來的毫秒時間
問題是排序后,我根據整個組合鍵得到了很多減速器
通過調試,我已驗證哈希碼和比較函數正確。 通過調試日志記錄,每個塊來自不同的reducer的位置表明,分組或分區均未成功。 從調試日志中:
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:03 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:03 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key the voice ended
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=top gear
14/12/14 00:55:12 INFO popularitweet.EtanReducer: top gear: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key top gear ended
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=american horror story
14/12/14 00:55:12 INFO popularitweet.EtanReducer: american horror story: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key american horror story ended
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key the voice ended
如您所見,聲音被發送到兩個不同的減速器,但時間戳不同。 任何幫助,將不勝感激。 組合鍵是以下類別:
public class ProgramKey implements WritableComparable<ProgramKey> {
private String program;
private Long timestamp;
public ProgramKey() {
}
public ProgramKey(String program, Long timestamp) {
this.program = program;
this.timestamp = timestamp;
}
@Override
public int compareTo(ProgramKey o) {
int result = program.compareTo(o.program);
if (result == 0) {
result = timestamp.compareTo(o.timestamp);
}
return result;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
WritableUtils.writeString(dataOutput, program);
dataOutput.writeLong(timestamp);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
program = WritableUtils.readString(dataInput);
timestamp = dataInput.readLong();
}
我實現的分區程序,GroupingComparator和SortingComparator是:
public class ProgramKeyPartitioner extends Partitioner<ProgramKey, TweetObject> {
@Override
public int getPartition(ProgramKey programKey, TweetObject tweetObject, int numPartitions) {
int hash = programKey.getProgram().hashCode();
int partition = hash % numPartitions;
return partition;
}
}
public class ProgramKeyGroupingComparator extends WritableComparator {
protected ProgramKeyGroupingComparator() {
super(ProgramKey.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
ProgramKey k1 = (ProgramKey) a;
ProgramKey k2 = (ProgramKey) b;
return k1.getProgram().compareTo(k2.getProgram());
}
}
public class TimeStampComparator extends WritableComparator {
protected TimeStampComparator() {
super(ProgramKey.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)a;
int result = ts1.getProgram().compareTo(ts2.getProgram());
if (result == 0) {
result = ts1.getTimestamp().compareTo(ts2.getTimestamp());
}
return result;
}
}
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
// Create configuration
Configuration conf = new Configuration();
// Create job
Job job = new Job(conf, "test1");
job.setJarByClass(EtanMapReduce.class);
// Set partitioner keyComparator and groupComparator
job.setPartitionerClass(ProgramKeyPartitioner.class);
job.setGroupingComparatorClass(ProgramKeyGroupingComparator.class);
job.setSortComparatorClass(TimeStampComparator.class);
// Setup MapReduce
job.setMapperClass(EtanMapper.class);
job.setMapOutputKeyClass(ProgramKey.class);
job.setMapOutputValueClass(TweetObject.class);
job.setReducerClass(EtanReducer.class);
// Specify key / value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(TweetObject.class);
// Input
FileInputFormat.addInputPath(job, inputPath);
job.setInputFormatClass(TextInputFormat.class);
// Output
FileOutputFormat.setOutputPath(job, outputDir);
job.setOutputFormatClass(TextOutputFormat.class);
// Delete output if exists
FileSystem hdfs = FileSystem.get(conf);
if (hdfs.exists(outputDir))
hdfs.delete(outputDir, true);
// Execute job
logger.info("starting job");
int code = job.waitForCompletion(true) ? 0 : 1;
System.exit(code);
}
編輯...
您的TimeStampComparator似乎有錯字...將ts2設置為a時應將其設置為b:
ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)a;
什么時候應該是:
ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)b;
這將導致鍵/值對的排序不正確,並使分組比較器對鍵/值對進行排序的假設無效。
還要檢查原始程序名稱是否在UTF-8中,這正是WritableUtils假定的。 您系統的默認代碼頁也是UTF-8嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.