[英]How to sort word count program by value or count?
How do I sort my wordcount output by count/value rather than by the key. 如何按字数/值而不是键对字数输出进行排序。
In the normal case, the output is 在正常情况下,输出为
hi 2
hw 3
wr 1
r 3
but the desired output is 但所需的输出是
wr 1
hi 2
hw 3
r 3
My code is: 我的代码是:
public class sortingprog {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(one,word);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<IntWritable,Text, IntWritable, Text> {
public void reduce(Iterator<IntWritable> key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
int sum=0;
while (key.hasNext()) {
sum+=key.next().get();
}
output.collect(new IntWritable(sum),value);
}
@Override
public void reduce(IntWritable arg0, Iterator<Text> arg1,
OutputCollector<IntWritable, Text> arg2, Reporter arg3)
throws IOException {
// TODO Auto-generated method stub
}
}
public static class GroupComparator extends WritableComparator {
protected GroupComparator() {
super(IntWritable.class, true);
}
@SuppressWarnings("rawtypes")
@Override
public int compare(WritableComparable w1, WritableComparable w2) {
IntWritable v1 = (IntWritable) w1;
IntWritable v2 = (IntWritable) w2;
return -1 * v1.compareTo(v2);
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(sortingprog.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setOutputValueGroupingComparator(GroupComparator.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
What you look for is called "Secondary Sort". 您要查找的内容称为“第二排序”。 Here you can find two tutorials of how to achieve a value short in your MapReduce: 在这里,您可以找到两个有关如何在MapReduce中实现短值的教程:
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
http://codingjunkie.net/secondary-sort/ http://codingjunkie.net/secondary-sort/
you need to do the following. 您需要执行以下操作。
public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
}public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
Let me know if you need additional help.
}public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
让我知道你是否需要额外的帮助。
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.