Hadoop Reduce output using IntWritable always stops at 2

Question

The Reduce program always outputs the value as 2, even though the list of values is more than 2 for the given key.

for example: The word count test file has words like The word count test file has words like The word count test file has words like

The output is : this 2 The 2 word 2

The Reduce code is:

public class WordCountReducer
  extends Reducer<Text, IntWritable, Text, IntWritable> {
    //public static final log LOG = LogFactory.getLog(MyMapper.class);
  @Override
  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {
      IntWritable count = null;

      for (IntWritable value: values) {
           if (count == null) {
            count = value;
           } else {

            count.set(count.get() + value.get());

           }
          }


    context.write(key, count);
  }

}

Can you please explain the problem here? When I use int counter it works fine.

Answer 1

count = value;

Don't do this. The reducer reuses this writable and so, regardless of what you set it to, it will always end up being the last value in the list of values for that key.

Instead, do this.

count = new IntWritable();

Hadoop Reduce output using IntWritable always stops at 2

Question

The Reduce code is:

1 answers

solution1
0 2014-02-28 02:39:51

Hadoop Reduce output using IntWritable always stops at 2

Question

The Reduce code is:

1 answers

solution1 0 2014-02-28 02:39:51

solution1
0 2014-02-28 02:39:51