简体繁体中英

How to use a HBase secondary index table as and input in a MapReduce Job?

原文 2019-04-23 13:18:49 6 1 hadoop/ mapreduce/ hbase

I new to HBase, I have a main table with rowkey =id-YYYYMMDD, and a secondary index table with rowkey =YYYYMMDD-id and a column with the rowkey in the main table. I will have about 1 million ids in the near future and I will need to create a MapReduce job to summarize the id in a given date (YYYYMMDD).

How do I pass the secondary index table to the mapreduce job so the corresponding "get(rowkey)" are run in the main table to get the columns and sumarize the data?

1 answers

You have 2 options:

First you run a scan on the index table. Scan will have startRow and stopRow (eg '20190401' and '20190402'), so it will scan a continuous key space area and collect IDs from the main table. Time complexity will be O(M), where M is a number of items in a given batch. Then you request data from main table by ids using Get.
Since you have date as part of your main table key, you can just do a MapReduce scan with a Key filtering, which will run in O(N/P), where N is a total amount of rows in table and P is the parallelism of your cluster.

HBase table as MapReduce input?

Synchronize data to HBase/HDFS and use it as input to MapReduce job

HBase mapreduce job - Multiple scans - How to set the table of each Scan

How to import a CSV into HBASE table using MapReduce

How to give output one mapreduce job as input of another mapreduce job?

How does HBase mapreduce job communicate with server? (newbie question)

Use some datatype as input for a MapReduce job.

Using an HBase table as MapReduce source

Mapfile as a input to a MapReduce job

Hbase mapreduce job: all column values are null

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question HBase table as MapReduce input? Synchronize data to HBase/HDFS and use it as input to MapReduce job HBase mapreduce job - Multiple scans - How to set the table of each Scan How to import a CSV into HBASE table using MapReduce How to give output one mapreduce job as input of another mapreduce job? How does HBase mapreduce job communicate with server? (newbie question) Use some datatype as input for a MapReduce job. Using an HBase table as MapReduce source Mapfile as a input to a MapReduce job Hbase mapreduce job: all column values are null

Related Tags

How to use a HBase secondary index table as and input in a MapReduce Job?

Question

1 answers

solution1 0 2019-04-26 17:39:59

solution1
0 2019-04-26 17:39:59