简体   繁体   中英

Hadoop - Creating a single instance of a class for each map() functions inside the Mapper for a particular node

I have a Class something like this in java for hadoop MapReduce

public Class MyClass {
    public static MyClassMapper extends Mapper {
        static SomeClass someClassObj = new SomeClass();

        void map(Object Key, Text value, Context context) {
             String someText = someClassObj.getSomeThing();
        }
    }
}

I need only a single instance of someClassObj to be available to the map() function per node . How can achieve that?

Please feel free to ask if you need further details on this topic.

Thank you!

The mapreduce.tasktracker.map.tasks.maximum (defaulted to 2) controls the maximum number of map tasks that are run simultaneously by a TaskTracker. Set this value to 1.

Each map task is launched is a seperate JVM. Also set the mapreduce.job.jvm.numtasks to -1 to reuse the JVM .

The above settings will enable all the map tasks to run in single JVM sequentially. Now, SomeClass has to be made a singleton class .

This is not a best practice as the node is not efficiently utilized because of the lower number of map tasks that can run in parallel. Also, with JVM reuse there is no isolation between the tasks, so if there is any memory leak it will be carried on till the jvm crashes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM