简体   繁体   中英

Hadoop use one instance for each mapper

I'm using Hadoop 's map reduce to parse xml files. So I have a class called Parser that can have a method parse() to parse the xml files. And So I should use it in the Mapper's map() function.

However it means that every time, when I want to call a Parser , I need to create a Parser instance. But this instance should be the same for each map job. So I'm wondering if I can just instantize this Parser just once?

And just another add-on question, why the Mapper class is always static?

To ensure one parser instance per Mapper , please use mappers setup method for instantiating your parser instance and clean using cleanup method.

Same thing we applied for protobuf parser which we had, but need to make sure that your parser instance is thread safe, and no shared data. Note : setup and cleanup method will be called only once per mapper so we can initialize private variables there. To clarify what cricket_007 said in "In a distributed computing environment, sharing instances of a variable isn't possible..."

we have a practice of reusing of writable classes instead of creating new writables every time we need. we can instantiate once and re-set the writable multiple times as described by Tip 6 Similarly parser objects also can be re-used(Tip-6 style). as described in below code. For ex :

private YourXMLParser xmlParser = null;
    @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            super.setup(context);
            xmlParser= new YourXMLParser();        
        }

     @Override
        protected void cleanup(Mapper<ImmutableBytesWritable, Result, NullWritable, Put>.Context context) throws IOException,
                        InterruptedException {
            super.cleanup(context);
                  xmlParser= null;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM