简体繁体中英

Hbase scan vs Mapreduce for on the fly computation

原文 2014-11-21 15:08:11 3 1 java/ performance/ hadoop/ mapreduce/ hbase

I need to compute aggregate over HBase table.

Say I have this hbase table: 'metadata' Column family:M column:n

Here metadata object has a list of strings

class metadata { List tags;

}

I need to compute the count of tags for which I was thinking of using either using mapreduce or scan over hbase directly.

The result has to be returned on the fly . So which one can I use in this scenario? Scan over hbase and compute the aggregate or mapreduce?

Mapreduce ultimately is going to scan hbase and compute the count.

What are the pros and cons of using either of these?

1 answers

I suspect you're not aware about what are the pros and cons of HBase, it's not suited for computing realtime aggregations of large datasets.

Let's start by saying that MapReduce is a scheduled job by itself, you won't be able to return the response on the fly, expect no less than 15 seconds for the Task Tracker to initialize the job.

In the end, the MapReduce Job will do exactly the same thing: a HBase scan, the difference between performing the scan right-away and the MapReduce it's just the paralellization and data locality, which excels when you have millions/billions of rows. If your queries only needs to read a few thousand consecutive rows to aggregate them, sure, you could just do a scan and it will probably have an acceptable response time, but for larger datasets it's just going to be impossible to do that at query time.

HBase is best suited for handling tons of atomic reads and writes, that way, you can maintain those aggregations in real time, no matter how many pre-aggregated counters you'll need or how many requests you're going to receive: with a proper row key design and split policy you can scale to satisfy the demand.

Think of it as a word count, you could store all the words in a list and count them at query-time when requested or you can process that list at insert-time and store the number of times each word is used in the document, as global counter, and in a daily, monthly, yearly, per-country, per-author tables (or even families).

mapreduce, hbase and scan

during HBase scan with MapReduce, the number of Reducer is always one

Combiner creating mapoutput file per region in HBase scan mapreduce

HBase MapReduce

How can I limit the scan of HBase to only relevant (Unfiltered) regions for the MapReduce job

HBase mapreduce: write into HBase in Reducer

Nullpointer exception in HBase MapReduce

hadoop hbase mapreduce combiner

MapReduce HBase NullPointerException

matrix computation using hadoop mapreduce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question mapreduce, hbase and scan during HBase scan with MapReduce, the number of Reducer is always one Combiner creating mapoutput file per region in HBase scan mapreduce HBase MapReduce How can I limit the scan of HBase to only relevant (Unfiltered) regions for the MapReduce job HBase mapreduce: write into HBase in Reducer Nullpointer exception in HBase MapReduce hadoop hbase mapreduce combiner MapReduce HBase NullPointerException matrix computation using hadoop mapreduce

Related Tags

Hbase scan vs Mapreduce for on the fly computation

Question

1 answers

solution1 0 ACCPTED 2014-12-27 13:22:16

solution1
0 ACCPTED 2014-12-27 13:22:16