简体繁体中英

HBase chain MapReduce job with broadcasting smaller tables to all Mappers

原文 2014-07-02 23:47:49 6 1 hadoop/ mapreduce/ hbase

I am trying to write a chained MapReduce job on data present in HBase tables and need some help with the concept. I am not expecting people to provide code by pseudo code for this based on HBase's Java API would be nice.

In a nutshell, what I am trying to do is,

MapReduce Job 1: Read data from two tables with no common row keys and create a summary out of them in the reducer. The output of the reducer is a Java Object containing the summary which has been serialized to byte code. I store this object in a temporary table in HBase.

MapReduce Job 2: This is where I am having problems. I now need to read this summary object such that it is available in each mapper so that when I read data from a third (different) table, I can use this summary object to perform more calculations on the data I am reading from the third table.

I read about distributed cache and tried to implement it, but that doesn't seem to work out. I can provide more details in the form of edits if the need arises because I don't want to spam this question, right now, with details which might be irrelevant.

1 answers

Well, this might sound stupid, but if we have a really small table which we query, we can probably get away with reading the values using the HBase Java API (even in a MapReduce job) and then storing them in static variables. That way, we have to read those values only once per Mapper and it won't be much of an overhead.

Hanging Mapreduce job while reading hbase tables

Hbase mapreduce job: all column values are null

HBase MapReduce split scan for different mappers

Hadoop mapreduce : Driver for chaining mappers within a MapReduce job

Configuring memory for mappers and reducer during mapreduce job submission

NoServerForRegionException while running Hadoop MapReduce job on HBase

NullPoinerEcxeption while running HBase MapReduce Job

HBase bulk delete using MapReduce job

In Hadoop mapreduce does all mappers need to communicate with all reducers?

Chain of events when running a MapReduce job

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Hanging Mapreduce job while reading hbase tables Hbase mapreduce job: all column values are null HBase MapReduce split scan for different mappers Hadoop mapreduce : Driver for chaining mappers within a MapReduce job Configuring memory for mappers and reducer during mapreduce job submission NoServerForRegionException while running Hadoop MapReduce job on HBase NullPoinerEcxeption while running HBase MapReduce Job HBase bulk delete using MapReduce job In Hadoop mapreduce does all mappers need to communicate with all reducers? Chain of events when running a MapReduce job

Related Tags

HBase chain MapReduce job with broadcasting smaller tables to all Mappers

Question

1 answers

solution1 0 ACCPTED 2014-07-26 02:40:54

solution1
0 ACCPTED 2014-07-26 02:40:54