[英]How can I count specific word using mapreduce?
我正在修改普通的單詞計數程序,該程序對每個單詞進行計數,使其僅對特定單詞計數。
reducer和map類與普通字數相同。 無法正確計算字數。 我在文件中多次出現相同的特定單詞,但計數為一個。
public class wordcountmapper extends MapReduceBase implements Mapper<LongWritable, Tex, Text, IntWritable> // mapper function implemented.
{
private final static IntWritable one = new IntWritable(1); // intwritable
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString(); // conversion in string
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
if (line.compareTo("Cold") == 0) { //cold is the specific word to get count for
output.collect(word, one); // getting 1 as a count for 'cold' as if its counting only first line 'cold' and not going to next line.
}
}
}
}
首先,您的if statement
將行對象與“ Cold”進行比較,這是錯誤的。 應該將標記詞與“ Cold” if(tokenizer.nextToken().equals("Cold"))
。
我不確定在當前邏輯下如何將“冷”的計數設為1。可能在輸入中有一行帶有單個單詞的行“冷”。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.