简体   繁体   English

未能设置KeyComparator函数

[英]failed to set a KeyComparator function

I'm trying to sort the data by value 我正在尝试按值对数据进行排序

The method i use is to combine the key and value to a composite key 我使用的方法是将键和值组合到一个组合键中

eg (key,value) -> ({key,value},value) 例如(key,value)->({key,value},value)

and define my KeyComaparator which is compare the value part in the key 并定义我的KeyComaparator来比较键中的值部分

my data is a paragraph that i should count the words 我的数据是一个段落,我应该数词

and i done two job, the first one do the wordCount, but combine the key to composite key in reducer. 我完成了两项工作,第一个完成了wordCount,但是将键组合为reducer中的复合键。

this is the result 这是结果

is,4 4 是,4 4
the,15 15 15 15
ECA,1 1 非洲经委会1 1
to,6 6 至6 6
..... .....

and in the second job, I try to use the composite key to sort by the value 在第二项工作中,我尝试使用复合键按值排序

this is my mapper2 这是我的mapper2

public static class Map2 extends MapReduceBase
    implements Mapper<LongWritable,Text,Text,IntWritable>{

            private Text word = new Text();
            public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
                    String line = value.toString();
                    String w1[] = line.split("\t");
                    word.set(w1[0]);
                    output.collect(word,new IntWritable(Integer.valueOf(w1[1])));
            }
    }

and here is my Keycomparator 这是我的钥匙比较器

public static final class KeyComparator extends WritableComparator {
    public KeyComparator(){
            super(Text.class,true);
    }
@Override
public int compare(WritableComparable tp1, WritableComparable tp2) {
    Text t1 = (Text)tp1;
    Text t2 = (Text)tp2;
    String a[] = t1.toString().split(",");
    String b[] = t2.toString().split(",");
    return a[1].compareTo(b[1]);


}

this is my reducer2 这是我的减速机2

public static class Reduce2 extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {

            public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException{
                    int sum=0;
            while( values.hasNext()){
                    sum+= values.next().get();
            }
            //String cpKey[] = key.toString().split(",");
            Text outputKey = new Text();
            //outputKey.set(cpKey[0]);
            output.collect(key, new IntWritable(sum));
            }

    }

here is my main function 这是我的主要功能

 public static void main(String[] args) throws Exception {
            int reduceTasks = 1;
            int mapTasks = 3;

            System.out.println("1. New JobConf...");
            JobConf conf = new JobConf(WordCountV2.class);
            conf.setJobName("WordCount");

            System.out.println("2. Setting output key and value...");
            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            System.out.println("3. Setting Mapper and Reducer classes...");
            conf.setMapperClass(Map.class);
            conf.setReducerClass(Reduce.class);

            // set numbers of reducers
            System.out.println("4. Setting number of reduce and map tasks...");
            conf.setNumReduceTasks(reduceTasks);
            conf.setNumMapTasks(mapTasks);

            System.out.println("5. Setting input and output formats...");
            conf.setInputFormat(TextInputFormat.class);
            conf.setOutputFormat(TextOutputFormat.class);


            System.out.println("6. Setting input and output paths...");
            FileInputFormat.setInputPaths(conf, new Path(args[0]));
            String TempDir = "temp" + Integer.toString(new Random().nextInt(1000)+1);
            FileOutputFormat.setOutputPath(conf, new Path(TempDir));
            //FileOutputFormat.setOutputPath(conf,new Path(args[1]));
            System.out.println("7. Running job...");
            JobClient.runJob(conf);
            JobConf sort = new JobConf(WordCountV2.class);
            sort.setJobName("sort");
            sort.setMapOutputKeyClass(Text.class);
            sort.setMapOutputValueClass(IntWritable.class);
            sort.setOutputKeyComparatorClass(KeyComparator.class);
            sort.setMapperClass(Map2.class);
            sort.setReducerClass(Reduce2.class);
            sort.setNumReduceTasks(reduceTasks);
            sort.setNumMapTasks(mapTasks);
            sort.setInputFormat(TextInputFormat.class);
            sort.setOutputFormat(TextOutputFormat.class);
            FileInputFormat.setInputPaths(sort,TempDir);
            FileOutputFormat.setOutputPath(sort, new Path(args[1]));
            JobClient.runJob(sort);


    }

but the result is kind of this 但结果是这样的

is 13 是13
the 32 32
ECA 21 非洲经委会21
to 14 至14
. . .

and lost many word 丢了很多字

but if i didn't use my Keycomparator 但是如果我不使用我的Keycomparator

it returns to the result which is not sorted, just like the first one i mentioned 它返回到未排序的结果,就像我提到的第一个

any ideas to solve the problem? 有解决问题的想法吗? thanks! 谢谢!

I'm not sure where you are making mistake. 我不确定您在哪里犯错。
But what you are trying to do is called Secondary Sort Sorting based on value. 但是,您尝试执行的操作称为基于值的Secondary Sort排序。
It's not a trivial job to do, but you need to create more classes for patition,aggregation and other stuff which is clearly explained Here and Here 这不是一件容易的事,但是您需要创建更多用于分类,聚合和其他内容的类,这在这里这里都得到了明确的解释。
Just following the instructions in those blogs will surely help you. 只需按照这些博客中的说明进行操作,无疑会对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM