Hadoop Text class set method

Question

Here is code sample from WordCount example of Hadoop:

class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
    private Text outputKey;
    private IntWritable outputVal;

    @Override
    public void setup(Context context) {
        outputKey = new Text();
        outputVal = new IntWritable(1);
    }

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer stk = new StringTokenizer(value.toString());
        while(stk.hasMoreTokens()) {
          outputKey.set(stk.nextToken());
          context.write(outputKey, outputVal);
        }
    }
}

There is only one outputKey instance. In while loop, outputKey set different words and be written as key of context . Is outputKey instance shared within the whole <key, value> pairs?

Why not use context.write(new Text(stk.nextToken()), new IntWritable(1)) ?

Answer 1

It's just for efficiency reasons.

Read this article: http://www.joeondata.com/2014/05/22/memory-management-in-hadoop-mapreduce/ .

"For instance, if you use an org.apache.hadoop.io.Text as a map output key, you can create a single non-static final instance of a Text object in your Mapper class. Then each time the map method is called, you can either clear or just set the singular text instance and then write it to the mapper's context. The context will then use/copy the data before it calls your map method again so you don't have to worry about overwriting data being used by the framework."

Hadoop Text class set method

Question

1 answers

solution1
4 ACCPTED 2015-02-09 14:22:20

Hadoop Text class set method

Question

1 answers

solution1 4 ACCPTED 2015-02-09 14:22:20

solution1
4 ACCPTED 2015-02-09 14:22:20