簡體   English   中英

如何使用mapreduce計數特定單詞?

[英]How can I count specific word using mapreduce?

我正在修改普通的單詞計數程序,該程序對每個單詞進行計數,使其僅對特定單詞計數。

reducer和map類與普通字數相同。 無法正確計算字數。 我在文件中多次出現相同的特定單詞,但計數為一個。

public class wordcountmapper extends MapReduceBase implements Mapper<LongWritable, Tex, Text, IntWritable>                       // mapper function implemented.
{
    private final static IntWritable one = new IntWritable(1); // intwritable
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();      // conversion in string
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            if (line.compareTo("Cold") == 0) {  //cold is the specific word to get count for
                output.collect(word, one);      // getting 1 as a count for 'cold' as if its counting only first line 'cold' and not going to next line.
            }
        }
    }
}

首先,您的if statement將行對象與“ Cold”進行比較,這是錯誤的。 應該將標記詞與“ Cold” if(tokenizer.nextToken().equals("Cold"))

我不確定在當前邏輯下如何將“冷”的計數設為1。可能在輸入中有一行帶有單個單詞的行“冷”。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM