简体   繁体   中英

can you explain word count mapreduce program step by step

Can you explain any map reduce program. for example in word count program class in class is innerclass. can you explain the program step by step. what is the meaning of angle bracket. why we are writing output parameters also. what is context object. Like that can you explain the program step by step. I know logic but I can't understand few Java statements

public class WordCount {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
           word.set(tokenizer.nextToken());
           context.write(word, one);
       }
   }
} 

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

   public void reduce(Text key, Iterable<IntWritable> values, Context context) 
     throws IOException, InterruptedException {
       int sum = 0;
       for (IntWritable val : values) {
           sum += val.get();
       }
       context.write(key, new IntWritable(sum));
   }
}

public static void main(String[] args) throws Exception {
   Configuration conf = new Configuration();

       Job job = new Job(conf, "wordcount");

   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);

   job.setMapperClass(Map.class);
   job.setReducerClass(Reduce.class);

   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);

   FileInputFormat.addInputPath(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));

   job.waitForCompletion(true);
}

}

Your Map class extends Mapper class of Hadoop where generics are mentioned of Input and output parameters. First 2 parameters are Input Key-Value whereas Last 2 parameters are output Key-Value. The Mapper class needs to override map() method. Your mapper logic goes here. This method accepts specified Input parameters and returns void and writes Key-Value pair to Context (memory).

Your Reduce class extends Reducer class. The input of Reducer should match output Key-Value of Mapper/Combiner. The Reducer class needs to override reduce() method. Your reducer logic goes here. This method accepts specified Input parameters and returns void and reads Key-Value pair from Context (memory).

Hadoop performs combiner, sorting, shuffling operation between these two methods.

Your main method contains code setup Hadoop job.

Few more clarifications from. macalester.edu and javacodegeeks

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM