简体   繁体   English

Hadoop - 为Mapper中的每个map()函数创建一个类的单个实例,用于特定节点

[英]Hadoop - Creating a single instance of a class for each map() functions inside the Mapper for a particular node

I have a Class something like this in java for hadoop MapReduce 我在java中为hadoop MapReduce提供了类似这样的类

public Class MyClass {
    public static MyClassMapper extends Mapper {
        static SomeClass someClassObj = new SomeClass();

        void map(Object Key, Text value, Context context) {
             String someText = someClassObj.getSomeThing();
        }
    }
}

I need only a single instance of someClassObj to be available to the map() function per node . 我只需要someClassObj的一个实例可用于每个节点的map()函数。 How can achieve that? 怎么能实现呢?

Please feel free to ask if you need further details on this topic. 如果您需要有关此主题的更多详细信息,请随时询问。

Thank you! 谢谢!

The mapreduce.tasktracker.map.tasks.maximum (defaulted to 2) controls the maximum number of map tasks that are run simultaneously by a TaskTracker. mapreduce.tasktracker.map.tasks.maximum(默认为2)控制TaskTracker同时运行的最大映射任务数。 Set this value to 1. 将此值设置为1。

Each map task is launched is a seperate JVM. 每个map任务启动都是一个单独的JVM。 Also set the mapreduce.job.jvm.numtasks to -1 to reuse the JVM . 还要将mapreduce.job.jvm.numtasks设置为-1以重用JVM

The above settings will enable all the map tasks to run in single JVM sequentially. 上述设置将使所有映射任务按顺序在单个JVM中运行。 Now, SomeClass has to be made a singleton class . 现在,SomeClass必须成为单例类

This is not a best practice as the node is not efficiently utilized because of the lower number of map tasks that can run in parallel. 这不是最佳实践,因为节点没有被有效利用,因为可以并行运行的地图任务数量较少。 Also, with JVM reuse there is no isolation between the tasks, so if there is any memory leak it will be carried on till the jvm crashes. 此外,通过JVM重用,任务之间没有隔离,因此如果存在任何内存泄漏,它将继续执行直到jvm崩溃。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM