简体   繁体   English

Hadoop Word count:接收以字母“c”开头的单词总数

[英]Hadoop Word count: receive the total number of words that start with the letter "c"

Heres the Hadoop word count java map and reduce source code:下面是 Hadoop 字数统计 java map 和 reduce 源代码:

In the map function, I've gotten to where I can output all the word that starts with the letter "c" and also the total number of times that word appears, but what I'm trying to do is just output the total number of words starting with the letter "c" but I'm stuck a little on getting the total number.Any help would be greatly appreciated, Thank you.在 map 函数中,我已经到了可以输出所有以字母“c”开头的单词以及该单词出现的总次数的地方,但我想要做的只是输出总数以字母“c”开头的单词,但我在获取总数方面有点卡住了。任何帮助将不胜感激,谢谢。

Example例子

My Output of what I'm getting:我得到的输出:

could 2可以 2

can 3可以 3

cat 5猫 5

What I'm trying to get:我想得到什么:

c-total 10 c-总计 10

public static class MapClass extends MapReduceBase
   implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output,
                Reporter reporter) throws IOException {
  String line = value.toString();
  StringTokenizer itr = new StringTokenizer(line);
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    if(word.toString().startsWith("c"){
    output.collect(word, one);
   }
  }
 } 
}


public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,
                   OutputCollector<Text, IntWritable> output,
                   Reporter reporter) throws IOException {
  int sum = 0;
  while (values.hasNext()) {
    sum += values.next().get(); //gets the sum of the words and add them together
  }
  output.collect(key, new IntWritable(sum)); //outputs the word and the number
  }
 }

Instead of 代替

output.collect(word, one);

in your mapper, try: 在您的映射器中,尝试:

output.collect("c-total", one);

Chris Gerken 's answer is right. 克里斯·格肯的答案是正确的。

If you are outputing word as your key it will only help you to calculate the count of unique words starting with "c" 如果您要输出单词作为关键字,则只会帮助您计算以“ c”开头的唯一单词的数量

Not all total count of "c". 并非所有“ c”的总数。

So for that you need to output a unique key from mapper. 因此,您需要从mapper输出唯一的密钥。

 while (itr.hasMoreTokens()) {
            String token = itr.nextToken();
            if(token.startsWith("c")){
                word.set("C_Count");
                output.collect(word, one);
            }

        }

Here is an example using New Api 这是使用New Api的示例

Driver class 驾驶舱

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = new Job(conf, "wordcount");
        FileSystem fs = FileSystem.get(conf);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        if (fs.exists(new Path(args[1])))
            fs.delete(new Path(args[1]), true);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setJarByClass(WordCount.class);     
        job.waitForCompletion(true);
    }

}

Mapper class 映射器类

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer itr = new StringTokenizer(line);
        while (itr.hasMoreTokens()) {
            String token = itr.nextToken();
            if(token.startsWith("c")){
                word.set("C_Count");
                context.write(word, one);
            }

        }
    }
}

Reducer class 减速机类

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

Simpler code for mapper: 映射器的简单代码:

public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> op, Reporter r)throws IOException
{
    String s = value.toString();
      for (String w : s.split("\\W+"))
       {
       if (w.length()>0)
        {
         if(w.startsWith("C")){
         op.collect(new Text("C-Count"), new IntWritable(1));        
         }
       }
  }
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {

public void map(LongWritable ikey, Text ivalue, Context context)
        throws IOException, InterruptedException {
    
    String line= ivalue.toString();
    String [] values = line.split(" ");
    IntWritable val=new IntWritable(1);
    for(String i:values)
    {
        String x=i.charAt(0);
        if(x=='c')
        {
        context.write(new Text("c"),val);
        }   }
}}


public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)
        throws IOException, InterruptedException {
    int sum=0;
    for (IntWritable val : values) {
                    sum=sum+val.get();
    }
    context.write(key,new IntWritable(sum));
}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM