简体   繁体   English

Hadoop为每个映射器使用一个实例

[英]Hadoop use one instance for each mapper

I'm using Hadoop 's map reduce to parse xml files. 我正在使用Hadoop的map reduce来解析xml文件。 So I have a class called Parser that can have a method parse() to parse the xml files. 所以我有一个名为Parser的类,它可以有一个方法parse()来解析xml文件。 And So I should use it in the Mapper's map() function. 所以我应该在Mapper的map()函数中使用它。

However it means that every time, when I want to call a Parser , I need to create a Parser instance. 但是这意味着每次当我想调用Parser ,我都需要创建一个Parser实例。 But this instance should be the same for each map job. 但是对于每个地图作业,此实例应该相同。 So I'm wondering if I can just instantize this Parser just once? 所以我想知道我是否可以只将这个Parser实例化一次?

And just another add-on question, why the Mapper class is always static? 还有另外一个附加问题,为什么Mapper类总是静态的?

To ensure one parser instance per Mapper , please use mappers setup method for instantiating your parser instance and clean using cleanup method. 要确保每个Mapper有一个解析器实例,请使用mappers setup方法实例化解析器实例并使用清理方法清理。

Same thing we applied for protobuf parser which we had, but need to make sure that your parser instance is thread safe, and no shared data. 同样我们申请了protobuf解析器,但是需要确保你的解析器实例是线程安全的,没有共享数据。 Note : setup and cleanup method will be called only once per mapper so we can initialize private variables there. 注意:每个映射器只调用一次setup和cleanup方法,因此我们可以在那里初始化私有变量。 To clarify what cricket_007 said in "In a distributed computing environment, sharing instances of a variable isn't possible..." 为了澄清cricket_007在“在分布式计算环境中,共享变量实例是不可能的......”中所说的内容

we have a practice of reusing of writable classes instead of creating new writables every time we need. 我们有一种重复使用可写类的做法,而不是每次需要时创建新的可写。 we can instantiate once and re-set the writable multiple times as described by Tip 6 Similarly parser objects also can be re-used(Tip-6 style). 我们可以实例化一次并重复设置可写多次,如提示6所述。同样,解析器对象也可以重复使用(Tip-6样式)。 as described in below code. 如下面的代码所述。 For ex : 例如:

private YourXMLParser xmlParser = null;
    @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            super.setup(context);
            xmlParser= new YourXMLParser();        
        }

     @Override
        protected void cleanup(Mapper<ImmutableBytesWritable, Result, NullWritable, Put>.Context context) throws IOException,
                        InterruptedException {
            super.cleanup(context);
                  xmlParser= null;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM