簡體   English   中英

如何使用Java Hadoop MapReduce以降序對數據集中的列進行排序?

[英]How to sort a column in data set in descending order using Java Hadoop map reduce?

我的數據文件是:

Utsav   Chatterjee  Dangerous   Soccer  Coldplay    4
Rodney  Purtle  Awesome Football    Maroon5 3
Michael Gross   Amazing Basketball  Iron Maiden 6
Emmanuel    Ezeigwe Cool    Pool    Metallica   5
John    Doe Boring  Golf    Linkin Park 8
David   Bekham  Godlike Soccer  Justin Beiber   89
Abhishek    Kumar   Geek    Cricket Abhishek Kumar  7
Abhishek    Singh   Geek    Cricket Abhishek Kumar  7

我想在調用hadoop jar時將列號作為參數傳遞,並且我要求根據該特定列以降序對整個數據集進行排序。 通過將所需的列設置為映射器輸出中的鍵,我可以輕松地按升序進行此操作。 但是,我無法以降序完成此操作。

我的Mapper和Reducer代碼是:

public static class Map extends Mapper<LongWritable,Text,Text,Text>{
        public static void map(LongWritable key, Text value, Context context)
        throws IOException,InterruptedException 
        {
            Configuration conf = context.getConfiguration();
            String param = conf.get("columnRef");
            int colref = Integer.parseInt(param);
            String line = value.toString();
            String[] parts = line.split("\t");
            context.write(new Text(parts[colref]), value);
            }
        }

    public static class Reduce extends Reducer<Text,Text,Text,Text>{
        public void reduce(Text key, Iterable<Text> value, Context context)
        throws IOException,InterruptedException 
        {
            for (Text text : value) {
                context.write(text,null );
            }
        }
    }

我的比較器類是:

public static class sortComparator extends WritableComparator {

         protected sortComparator() {
          super(LongWritable.class, true);
          // TODO Auto-generated constructor stub
         }

         @Override
         public int compare(WritableComparable o1, WritableComparable o2) {
          LongWritable k1 = (LongWritable) o1;
          LongWritable k2 = (LongWritable) o2;
          int cmp = k1.compareTo(k2);
          return -1 * cmp;
         }

        }

我可能對比較器做錯了。 有人可以幫我從這里出去嗎? 當我運行此命令時,選擇索引為5的列(最后一個數字列)作為這種排序的基礎,我仍然會得到升序的結果。

驅動類別:

public static void main(String[] args) throws Exception {

        Configuration conf= new Configuration();
        conf.set("columnRef", args[2]);

        Job job = new Job(conf, "Sort");

        job.setJarByClass(Sort.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setSortComparatorClass(DescendingKeyComparator.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        Path outputPath = new Path(args[1]);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        outputPath.getFileSystem(conf).delete(outputPath);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

關於如何能夠完成此任務(降序)的任何建議對我都將非常有幫助! 謝謝

在驅動程序類中,以下代碼行: job.setSortComparatorClass(DescendingKeyComparator.class);

您已將類設置為DescendingKeyComparator.class。 將其設置為sortComparator.class。 它應該工作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM