简体   繁体   English

如何按值或计数对单词计数程序进行排序?

[英]How to sort word count program by value or count?

How do I sort my wordcount output by count/value rather than by the key. 如何按字数/值而不是键对字数输出进行排序。

In the normal case, the output is 在正常情况下,输出为

hi 2
hw 3 
wr 1 
r 3

but the desired output is 但所需的输出是

wr 1
hi 2
hw 3
r 3

My code is: 我的代码是:

public class sortingprog {
     public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();

         public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
             word.set(tokenizer.nextToken());
             output.collect(one,word);
           }
         }
       }


     public static class Reduce extends MapReduceBase implements Reducer<IntWritable,Text, IntWritable, Text> {
     public void reduce(Iterator<IntWritable> key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
            int sum=0;
           while (key.hasNext()) {
             sum+=key.next().get();
           }
           output.collect(new IntWritable(sum),value);

     }

    @Override
    public void reduce(IntWritable arg0, Iterator<Text> arg1,
            OutputCollector<IntWritable, Text> arg2, Reporter arg3)
            throws IOException {
        // TODO Auto-generated method stub

    }
     }

     public static class GroupComparator extends WritableComparator {
            protected GroupComparator() {
                super(IntWritable.class, true);
            }

            @SuppressWarnings("rawtypes")
            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntWritable v1 = (IntWritable) w1;
                IntWritable v2 = (IntWritable) w2;          
                return -1 * v1.compareTo(v2);
            }
        }

       public static void main(String[] args) throws Exception {
         JobConf conf = new JobConf(sortingprog.class);
         conf.setJobName("wordcount");


         conf.setOutputKeyClass(IntWritable.class);
         conf.setOutputValueClass(Text.class);


         conf.setMapperClass(Map.class);
         conf.setReducerClass(Reduce.class);

         conf.setOutputValueGroupingComparator(GroupComparator.class);

         conf.setInputFormat(TextInputFormat.class);
         conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));
         FileOutputFormat.setOutputPath(conf, new Path(args[1]));

         JobClient.runJob(conf);
       }
}

What you look for is called "Secondary Sort". 您要查找的内容称为“第二排序”。 Here you can find two tutorials of how to achieve a value short in your MapReduce: 在这里,您可以找到两个有关如何在MapReduce中实现短值的教程:

http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

http://codingjunkie.net/secondary-sort/ http://codingjunkie.net/secondary-sort/

you need to do the following. 您需要执行以下操作。

  1. Create a custom writable comparable which uses both the fields. 创建一个使用两个字段的自定义可写可比对象。
  2. In the compareTo method provide the implementation logic of comparing the custom writable. 在compareTo方法中,提供比较自定义可写内容的实现逻辑。 This is called by the Reducer later to sort the keys. 然后,Reducer调用此方法对键进行排序。 That is key in the whole implementation. 这是整个实施过程中的关键。 Here in the compareTo just use the second field to compare the values. 这里在compareTo中,只需使用第二个字段来比较值。

public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
}
public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
}
Let me know if you need additional help.
public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
}
让我知道你是否需要额外的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM