Hadoop - 为Mapper中的每个map（）函数创建一个类的单个实例，用于特定节点

Question

I have a Class something like this in java for hadoop MapReduce 我在java中为hadoop MapReduce提供了类似这样的类

public Class MyClass {
    public static MyClassMapper extends Mapper {
        static SomeClass someClassObj = new SomeClass();

        void map(Object Key, Text value, Context context) {
             String someText = someClassObj.getSomeThing();
        }
    }
}

I need only a single instance of someClassObj to be available to the map() function per node . 我只需要someClassObj的一个实例可用于每个节点的map（）函数。 How can achieve that? 怎么能实现呢？

Please feel free to ask if you need further details on this topic. 如果您需要有关此主题的更多详细信息，请随时询问。

Thank you! 谢谢！

Answer 1

The mapreduce.tasktracker.map.tasks.maximum (defaulted to 2) controls the maximum number of map tasks that are run simultaneously by a TaskTracker. mapreduce.tasktracker.map.tasks.maximum（默认为2）控制TaskTracker同时运行的最大映射任务数。 Set this value to 1. 将此值设置为1。

Each map task is launched is a seperate JVM. 每个map任务启动都是一个单独的JVM。 Also set the mapreduce.job.jvm.numtasks to -1 to reuse the JVM . 还要将mapreduce.job.jvm.numtasks设置为-1以重用JVM 。

The above settings will enable all the map tasks to run in single JVM sequentially. 上述设置将使所有映射任务按顺序在单个JVM中运行。 Now, SomeClass has to be made a singleton class . 现在，SomeClass必须成为单例类。

This is not a best practice as the node is not efficiently utilized because of the lower number of map tasks that can run in parallel. 这不是最佳实践，因为节点没有被有效利用，因为可以并行运行的地图任务数量较少。 Also, with JVM reuse there is no isolation between the tasks, so if there is any memory leak it will be carried on till the jvm crashes. 此外，通过JVM重用，任务之间没有隔离，因此如果存在任何内存泄漏，它将继续执行直到jvm崩溃。

Hadoop - 为Mapper中的每个map（）函数创建一个类的单个实例，用于特定节点

问题描述

1 个解决方案

解决方案1
4 已采纳 2011-10-24 13:45:48

Hadoop - 为Mapper中的每个map（）函数创建一个类的单个实例，用于特定节点

问题描述

1 个解决方案

解决方案1 4 已采纳 2011-10-24 13:45:48

解决方案1
4 已采纳 2011-10-24 13:45:48