簡體   English   中英

如何按值或計數對單詞計數程序進行排序?

[英]How to sort word count program by value or count?

如何按字數/值而不是鍵對字數輸出進行排序。

在正常情況下,輸出為

hi 2
hw 3 
wr 1 
r 3

但所需的輸出是

wr 1
hi 2
hw 3
r 3

我的代碼是:

public class sortingprog {
     public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();

         public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
             word.set(tokenizer.nextToken());
             output.collect(one,word);
           }
         }
       }


     public static class Reduce extends MapReduceBase implements Reducer<IntWritable,Text, IntWritable, Text> {
     public void reduce(Iterator<IntWritable> key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
            int sum=0;
           while (key.hasNext()) {
             sum+=key.next().get();
           }
           output.collect(new IntWritable(sum),value);

     }

    @Override
    public void reduce(IntWritable arg0, Iterator<Text> arg1,
            OutputCollector<IntWritable, Text> arg2, Reporter arg3)
            throws IOException {
        // TODO Auto-generated method stub

    }
     }

     public static class GroupComparator extends WritableComparator {
            protected GroupComparator() {
                super(IntWritable.class, true);
            }

            @SuppressWarnings("rawtypes")
            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntWritable v1 = (IntWritable) w1;
                IntWritable v2 = (IntWritable) w2;          
                return -1 * v1.compareTo(v2);
            }
        }

       public static void main(String[] args) throws Exception {
         JobConf conf = new JobConf(sortingprog.class);
         conf.setJobName("wordcount");


         conf.setOutputKeyClass(IntWritable.class);
         conf.setOutputValueClass(Text.class);


         conf.setMapperClass(Map.class);
         conf.setReducerClass(Reduce.class);

         conf.setOutputValueGroupingComparator(GroupComparator.class);

         conf.setInputFormat(TextInputFormat.class);
         conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));
         FileOutputFormat.setOutputPath(conf, new Path(args[1]));

         JobClient.runJob(conf);
       }
}

您要查找的內容稱為“第二排序”。 在這里,您可以找到兩個有關如何在MapReduce中實現短值的教程:

http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

http://codingjunkie.net/secondary-sort/

您需要執行以下操作。

  1. 創建一個使用兩個字段的自定義可寫可比對象。
  2. 在compareTo方法中,提供比較自定義可寫內容的實現邏輯。 然后,Reducer調用此方法對鍵進行排序。 這是整個實施過程中的關鍵。 這里在compareTo中,只需使用第二個字段來比較值。

public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
}
public CustomPair implements WritableComparable{ public CustomPair(String fld1,int fld2){ this.fld1=fld1; //wr this.fld2=fld2;//1 } @Override public int compareTo(Object o2) { CustomPair other = (CustomPair ) o2; int compareValue = other.fld2().compareTo(this.fld2()); return compareValue; } public void write(DataOutput out) throws IOException { dataOutput.writeUTF(fld1); dataOutput.writeInt(fld2); } // You have to implement the rest of the methods.
}
讓我知道你是否需要額外的幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM