简体   繁体   English

你能一步一步解释字数统计mapreduce程序吗

[英]can you explain word count mapreduce program step by step

Can you explain any map reduce program.你能解释一下任何 map reduce 程序吗? for example in word count program class in class is innerclass.例如在单词计数程序类中的类是内部类。 can you explain the program step by step.你能一步一步解释这个程序吗? what is the meaning of angle bracket.尖括号是什么意思。 why we are writing output parameters also.为什么我们还要写输出参数。 what is context object.什么是上下文对象。 Like that can you explain the program step by step.这样你能不能一步一步地解释程序。 I know logic but I can't understand few Java statements我知道逻辑,但我无法理解一些 Java 语句

public class WordCount {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
           word.set(tokenizer.nextToken());
           context.write(word, one);
       }
   }
} 

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

   public void reduce(Text key, Iterable<IntWritable> values, Context context) 
     throws IOException, InterruptedException {
       int sum = 0;
       for (IntWritable val : values) {
           sum += val.get();
       }
       context.write(key, new IntWritable(sum));
   }
}

public static void main(String[] args) throws Exception {
   Configuration conf = new Configuration();

       Job job = new Job(conf, "wordcount");

   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);

   job.setMapperClass(Map.class);
   job.setReducerClass(Reduce.class);

   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);

   FileInputFormat.addInputPath(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));

   job.waitForCompletion(true);
}

}

Your Map class extends Mapper class of Hadoop where generics are mentioned of Input and output parameters.您的 Map 类扩展了 Hadoop 的 Mapper 类,其中提到了输入和输出参数的泛型。 First 2 parameters are Input Key-Value whereas Last 2 parameters are output Key-Value.前 2 个参数是输入键值,而后 2 个参数是输出键值。 The Mapper class needs to override map() method. Mapper 类需要覆盖 map() 方法。 Your mapper logic goes here.你的映射器逻辑在这里。 This method accepts specified Input parameters and returns void and writes Key-Value pair to Context (memory).此方法接受指定的 Input 参数并返回 void 并将 Key-Value 对写入 Context(内存)。

Your Reduce class extends Reducer class.您的 Reduce 类扩展了 Reducer 类。 The input of Reducer should match output Key-Value of Mapper/Combiner. Reducer 的输入应该匹配 Mapper/Combiner 的输出 Key-Value。 The Reducer class needs to override reduce() method. Reducer 类需要覆盖 reduce() 方法。 Your reducer logic goes here.你的减速器逻辑在这里。 This method accepts specified Input parameters and returns void and reads Key-Value pair from Context (memory).此方法接受指定的 Input 参数并返回 void 并从 Context(内存)中读取 Key-Value 对。

Hadoop performs combiner, sorting, shuffling operation between these two methods. Hadoop 在这两种方法之间执行组合器、排序、混洗操作。

Your main method contains code setup Hadoop job.您的主要方法包含代码设置 Hadoop 作业。

Few more clarifications from.更多的澄清来自。 macalester.edu and javacodegeeks macalester.edujavacodegeeks

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM