简体   繁体   中英

Hadoop Reduce output using IntWritable always stops at 2

The Reduce program always outputs the value as 2, even though the list of values is more than 2 for the given key.

for example: The word count test file has words like The word count test file has words like The word count test file has words like

The output is : this 2 The 2 word 2

The Reduce code is:

public class WordCountReducer
  extends Reducer<Text, IntWritable, Text, IntWritable> {
    //public static final log LOG = LogFactory.getLog(MyMapper.class);
  @Override
  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {
      IntWritable count = null;

      for (IntWritable value: values) {
           if (count == null) {
            count = value;
           } else {

            count.set(count.get() + value.get());

           }
          }


    context.write(key, count);
  }

}

Can you please explain the problem here? When I use int counter it works fine.

count = value;

Don't do this. The reducer reuses this writable and so, regardless of what you set it to, it will always end up being the last value in the list of values for that key.

Instead, do this.

count = new IntWritable();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM