Hadoop WordCount按单词出现次数排序

Question

I need to run WordCount which will give me all the words and their occurrences but sorted by the occurrences and not by the alphabet 我需要运行WordCount ，它将为我提供所有单词及其出现的位置，但按出现位置而不是字母排序

I understand that I need to create two jobs for this and run one after the other I used the mapper and the reducer from Sorted word count using Hadoop MapReduce 我了解我需要为此创建两个作业，一个接一个地运行，我使用Hadoop MapReduce使用了Sorted word count中的mapper和reducer

package org.myorg;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Job;

public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                output.collect(word, one);
            }
        }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

        public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            output.collect(key, new IntWritable(sum));
        }
    }

    class Map1 extends MapReduceBase implements Mapper<Object, Text, IntWritable, Text> {

        public void map(Object key, Text value, OutputCollector<IntWritable, Text> collector, Reporter arg3) throws IOException {
            String line = value.toString();
            StringTokenizer stringTokenizer = new StringTokenizer(line);
            {
                int number = 999;
                String word = "empty";

                if (stringTokenizer.hasMoreTokens()) {
                    String str0 = stringTokenizer.nextToken();
                    word = str0.trim();
                }

                if (stringTokenizer.hasMoreElements()) {
                    String str1 = stringTokenizer.nextToken();
                    number = Integer.parseInt(str1.trim());
                }
                collector.collect(new IntWritable(number), new Text(word));
            }

        }

    }

    class Reduce1 extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text> {

        public void reduce(IntWritable key, Iterator<Text> values, OutputCollector<IntWritable, Text> arg2, Reporter arg3) throws IOException {
            while ((values.hasNext())) {
                arg2.collect(key, values.next());
            }
        }

    }

    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordCount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path("/tmp/temp"));

    //JobClient.runJob(conf);
        //------------------------------------------------------------------
        JobConf conf2 = new JobConf(WordCount.class);
        conf2.setJobName("WordCount1");

        conf2.setOutputKeyClass(Text.class);
        conf2.setOutputValueClass(IntWritable.class);

        conf2.setMapperClass(Map1.class);
        conf2.setCombinerClass(Reduce1.class);
        conf2.setReducerClass(Reduce1.class);

        conf2.setInputFormat(TextInputFormat.class);
        conf2.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf2, new Path("/tmp/temp/part-00000"));
        FileOutputFormat.setOutputPath(conf2, new Path(args[1]));

        Job job1 = new Job(conf);
        Job job2 = new Job(conf2);

        job1.submit();
        if (job1.waitForCompletion(true)) {
            job2.submit();
    job1.waitForCompletion(true);
        }

    }
}

It's not working, what should I change here, or why it's not working ??? 它不起作用，我应该在这里更改什么，或者为什么它不起作用？？？

Answer 1

If the program runs until: 如果程序运行到：

    INFO input.FileInputFormat: Total input paths to process : 1

then the problem lies in your last line: 那么问题出在最后一行：

    job2.submit();

the job has been submitted but not queued to be processed. 作业已提交但未排队等待处理。 Try this: 尝试这个：

    job1.submit();
    if (job1.waitForCompletion(true)) {
        job2.submit();
        job2.waitForCompletion(true);
    }

to process your sorter MR job. 处理您的分拣机MR工作。 I've tried your code with the new API for MR and the flow works. 我已经使用新的MR API尝试了您的代码，并且流程正常工作。

Just add the last line. 只需添加最后一行。

Hadoop WordCount按单词出现次数排序

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-03-12 09:37:28

Hadoop WordCount按单词出现次数排序

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-03-12 09:37:28

解决方案1
1 已采纳 2014-03-12 09:37:28