简体   繁体   English

如何使用hadoop mapreduce编程计算文件中特定单词的出现次数?

[英]How to count the occurence of particular word in a file using hadoop mapreduce programming?

I am trying to count the occurrence of a particular word in a file using hadoop mapreduce programming in java. 我正在尝试使用Java中的hadoop mapreduce编程来计算文件中特定单词的出现。 Both the file and the word should be an user input. 文件和单词都应该是用户输入。 So I am trying to pass the particular word as third argument along with the i/p and o/p paths (In, Out, Word) . 所以我试图将特定的单词作为第三个参数以及i / p和o / p路径(In,Out,Word)传递。 But i am not able to find out a way to pass the word to the map function. 但是我无法找到将单词传递给map函数的方法。 I have tried the following way but it did not work: - created a static String variable in mapper class and assigned the value of my 3rd argument(ie. word to be searched) to it. 我尝试了以下方法,但是没有用:-在mapper类中创建了一个静态String变量,并为其分配了第3个参数(即要搜索的单词)的值。 And then tried to use this static variable inside map function. 然后尝试在map函数中使用此静态变量。 But inside map function the static variables value came as Null. 但是在map函数内部,静态变量的值为Null。 I am unable to get the third arument's value inside map function. 我无法在map函数中获取第三个arument的值。

Is there anyway to set the value via JobConf object? 反正有通过JobConf对象设置值吗? Please help. 请帮忙。 I have pasted my code below. 我在下面粘贴了我的代码。

public class MyWordCount {

    public static class MyWordCountMap extends Mapper < Text, Text, Text, LongWritable > {
        static String wordToSearch;
        private final static LongWritable ONE = new LongWritable(1L);
        private Text word = new Text();
        public void map(Text key, Text value, Context context)
        throws IOException, InterruptedException {
            System.out.println(wordToSearch); // Here the value is coming as Null
            if (value.toString().compareTo(wordToSearch) == 0) {
                context.write(word, ONE);
            }
        }
    }


    public static class SumReduce extends Reducer < Text, LongWritable, Text, LongWritable > {

        public void reduce(Text key, Iterator < LongWritable > values,
            Context context) throws IOException, InterruptedException {
            long sum = 0L;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            context.write(key, new LongWritable(sum));
        }
    }

    public static void main(String[] rawArgs) throws Exception {

        GenericOptionsParser parser = new GenericOptionsParser(rawArgs);
        Configuration conf = parser.getConfiguration();
        String[] args = parser.getRemainingArgs();
        Job job = new Job(conf, "wordcount");
        job.setJarByClass(MyWordCountMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        job.setMapperClass(MyWordCountMap.class);
        job.setReducerClass(SumReduce.class);
        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        String MyWord = args[2];
        MyWordCountMap.wordToSearch = MyWord;
        job.waitForCompletion(true);
    }

}

There is a way to do this with Configuration (see api here ). 有一种方法可以通过Configuration来做到这一点(请参阅api here )。 As an example, the following code can be used which sets "Tree" as the word to be searched: 例如,可以使用以下代码将“ Tree”设置为要搜索的单词:

//Create a new configuration
Configuration conf = new Configuration();
//Set the work to be searched
conf.set("wordToSearch", "Tree");
//create the job
Job job = new Job(conf);

Then, in your mapper/reducer class you can get wordToSearch (ie, "Tree" in this example) using the following: 然后,在您的mapper / reducer类中,您可以使用以下命令获取wordToSearch (在此示例中为“ Tree”):

//Create a new configuration
Configuration conf = context.getConfiguration();
//retrieve the wordToSearch variable
String wordToSearch = conf.get("wordToSearch");

See here for more details. 有关更多详细信息,请参见此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM