简体   繁体   中英

using context object with in the run method of Mapper class in Map-reduce Hadoop?

Here is the source code for Mapper

public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKeyValue()) {
      map(context.getCurrentKey(), context.getCurrentValue(), context);
    }
    cleanup(context);
  }
}

As you can notice, context is used both for read and write . How is it possible? That is context.getCurrentKey() and context.getCurrentValue() are used to retrieve key and value pair from context and is passed to map function. Is it same context used for Input and Output?

Yes, the same context is for both input and output. It stores references to RecordReader and RecordWriter . Whenever context.getCurrentKey() and context.getCurrentValue() are used to retrieve key and value pair, the request is delegated to RecordReader . And when context.write() is called, it is delegated to RecordWriter .

Note that RecordReader and RecordWriter are actually abstract classes.

Update:

org.apache.hadoop.mapreduce.Mapper$Context implements org.apache.hadoop.mapreduce.MapContext , which again sub classes org.apache.hadoop.mapreduce.TaskInputOutputContext

Look at the source of org.apache.hadoop.mapreduce.task.MapContextImpl . which is again a subclass of org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl to see where exactly Context delegates input and output to RecordReader and RecordWriter .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM