简体   繁体   中英

How to stop map task from setup method?

I have some map class inside class of job, and I need sometimes break execution of current task (The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job):

public static class TestJobMapper
        extends Mapper<LongWritable, Text, Text, Text> {

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        // here I want to check some predicate, and may be break execution of task
        // http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapreduce/Mapper.html
    }

    // continue....

You can break it quite easily by overriding the run() method.

In the normal code, this is implemented like this:

setup(context);
try {
  while (context.nextKeyValue()) 
    map(context.getCurrentKey(), context.getCurrentValue(), context);

} finally {
  cleanup(context);
}

What you can do is to do your setup around that:

@Override
public void run(Mapper<LongWritable, Text, Text, Text>.Context context)
        throws IOException, InterruptedException {

   if(Predicate.runMapper(context)) {
      super.run(context); // do the usual setup/map/cleanup cycle
   }
}

That way, the task directly goes into completion if your predicate tells it to. Which still has some overhead, but it is easier than to change an input format.

You cannot break the execution at the setup method.

However if you logic for not executing mapper on certain split is based on the split no. then you may be able to use a custom InputFormat and record reader to skip certain records/ input splits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM