Here is code sample from WordCount example of Hadoop:
class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text outputKey;
private IntWritable outputVal;
@Override
public void setup(Context context) {
outputKey = new Text();
outputVal = new IntWritable(1);
}
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer stk = new StringTokenizer(value.toString());
while(stk.hasMoreTokens()) {
outputKey.set(stk.nextToken());
context.write(outputKey, outputVal);
}
}
}
There is only one outputKey
instance. In while
loop, outputKey
set different words and be written as key of context
. Is outputKey
instance shared within the whole <key, value>
pairs?
Why not use context.write(new Text(stk.nextToken()), new IntWritable(1))
?
It's just for efficiency reasons.
Read this article: http://www.joeondata.com/2014/05/22/memory-management-in-hadoop-mapreduce/ .
"For instance, if you use an org.apache.hadoop.io.Text as a map output key, you can create a single non-static final instance of a Text object in your Mapper class. Then each time the map method is called, you can either clear or just set the singular text instance and then write it to the mapper's context. The context will then use/copy the data before it calls your map method again so you don't have to worry about overwriting data being used by the framework."
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.