简体   繁体   中英

How to count the occurence of particular word in a file using hadoop mapreduce programming?

I am trying to count the occurrence of a word in a file using hadoop mapreduce programming in java. 单词的出现。 Both the file and the word should be an user input. So I am trying to pass the particular word as third argument along with the i/p and o/p paths . 传递。 But i am not able to find out a way to pass the word to the map function. I have tried the following way but it did not work: - created a static String variable in mapper class and assigned the value of my 3rd argument(ie. word to be searched) to it. And then tried to use this static variable inside map function. But inside map function the static variables value came as Null. I am unable to get the third arument's value inside map function.

Is there anyway to set the value via JobConf object? Please help. I have pasted my code below.

public class MyWordCount {

    public static class MyWordCountMap extends Mapper < Text, Text, Text, LongWritable > {
        static String wordToSearch;
        private final static LongWritable ONE = new LongWritable(1L);
        private Text word = new Text();
        public void map(Text key, Text value, Context context)
        throws IOException, InterruptedException {
            System.out.println(wordToSearch); // Here the value is coming as Null
            if (value.toString().compareTo(wordToSearch) == 0) {
                context.write(word, ONE);
            }
        }
    }


    public static class SumReduce extends Reducer < Text, LongWritable, Text, LongWritable > {

        public void reduce(Text key, Iterator < LongWritable > values,
            Context context) throws IOException, InterruptedException {
            long sum = 0L;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            context.write(key, new LongWritable(sum));
        }
    }

    public static void main(String[] rawArgs) throws Exception {

        GenericOptionsParser parser = new GenericOptionsParser(rawArgs);
        Configuration conf = parser.getConfiguration();
        String[] args = parser.getRemainingArgs();
        Job job = new Job(conf, "wordcount");
        job.setJarByClass(MyWordCountMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        job.setMapperClass(MyWordCountMap.class);
        job.setReducerClass(SumReduce.class);
        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        String MyWord = args[2];
        MyWordCountMap.wordToSearch = MyWord;
        job.waitForCompletion(true);
    }

}

There is a way to do this with Configuration (see api here ). As an example, the following code can be used which sets "Tree" as the word to be searched:

//Create a new configuration
Configuration conf = new Configuration();
//Set the work to be searched
conf.set("wordToSearch", "Tree");
//create the job
Job job = new Job(conf);

Then, in your mapper/reducer class you can get wordToSearch (ie, "Tree" in this example) using the following:

//Create a new configuration
Configuration conf = context.getConfiguration();
//retrieve the wordToSearch variable
String wordToSearch = conf.get("wordToSearch");

See here for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM