简体   繁体   中英

How to sort a column in data set in descending order using Java Hadoop map reduce?

My data file is:

Utsav   Chatterjee  Dangerous   Soccer  Coldplay    4
Rodney  Purtle  Awesome Football    Maroon5 3
Michael Gross   Amazing Basketball  Iron Maiden 6
Emmanuel    Ezeigwe Cool    Pool    Metallica   5
John    Doe Boring  Golf    Linkin Park 8
David   Bekham  Godlike Soccer  Justin Beiber   89
Abhishek    Kumar   Geek    Cricket Abhishek Kumar  7
Abhishek    Singh   Geek    Cricket Abhishek Kumar  7

I want to pass the column number as an argument while invoking the hadoop jar and i require the entire data set to be sorted based on that particular column in Descending order. I could do this easily in Ascending order by setting the required column as key in mapper output. However, I'm unable to accomplish this in Descending order.

My Mapper and Reducer code is:

public static class Map extends Mapper<LongWritable,Text,Text,Text>{
        public static void map(LongWritable key, Text value, Context context)
        throws IOException,InterruptedException 
        {
            Configuration conf = context.getConfiguration();
            String param = conf.get("columnRef");
            int colref = Integer.parseInt(param);
            String line = value.toString();
            String[] parts = line.split("\t");
            context.write(new Text(parts[colref]), value);
            }
        }

    public static class Reduce extends Reducer<Text,Text,Text,Text>{
        public void reduce(Text key, Iterable<Text> value, Context context)
        throws IOException,InterruptedException 
        {
            for (Text text : value) {
                context.write(text,null );
            }
        }
    }

My comparator class is:

public static class sortComparator extends WritableComparator {

         protected sortComparator() {
          super(LongWritable.class, true);
          // TODO Auto-generated constructor stub
         }

         @Override
         public int compare(WritableComparable o1, WritableComparable o2) {
          LongWritable k1 = (LongWritable) o1;
          LongWritable k2 = (LongWritable) o2;
          int cmp = k1.compareTo(k2);
          return -1 * cmp;
         }

        }

I'm probably doing something wrong with the comparator. Can anyone help me out here? When I run this, picking column with index 5 (the numeric last column) to be the basis for this sort, I still get my result in ascending order.

Driver class:

public static void main(String[] args) throws Exception {

        Configuration conf= new Configuration();
        conf.set("columnRef", args[2]);

        Job job = new Job(conf, "Sort");

        job.setJarByClass(Sort.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setSortComparatorClass(DescendingKeyComparator.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        Path outputPath = new Path(args[1]);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        outputPath.getFileSystem(conf).delete(outputPath);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

Any advise on how may be able to achieve this task (descending order) will be very helpful for me!! Thanks

In your driver class, the following line of code: job.setSortComparatorClass(DescendingKeyComparator.class);

You have set class as DescendingKeyComparator.class. Set it to sortComparator.class instead. It should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM