简体   繁体   中英

MapReduce: How to pass HashMap to mappers

I'm designing the new generation of an analysis system which needs to process many events from many sensors in near-real time. And to do that I want to use one of the Big Data Analytics platforms such as Hadoop , Spark Streaming or Flink .

In order to analyze each event I need to use some meta-data from a table (DB) or at-least load it into a cached map.

The problem is that each mapper is going to be parallelized on several nodes.

So I have two things to handle:

  • First, how to load/pass a HashMap to a mapper?
  • Is there any way to keep the HashMap Consistent between the mappers?

Serialize HashMap structure to file, store it in HDFS and at the MapReduce job configuration phase use DistributedCache to spread file with serialized HashMap across all the mappers. Then at map phase each mapper can read the file, de-serialize and then access this HashMap.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM