简体繁体中英

distributed analysis of hbase data

原文 2012-09-30 04:06:09 9 1 java/ hadoop/ hbase/ distributed

I'm a bit new to hbase and have been able to setup hbase and query the data thats being stored on multiple hadoop machines but I'm wondering if its possible to distribute the analysis of data in hbase as well.

Here's my situation, I have a few billion records that I need to analyse quickly and I would like to have X servers query the database and get unique parts of the query so they can work on it instead of having a single server that goes through the entire dataset. Is this possible and how can I do it?

I'm very unsure how to approach this because I realize all the queries will need to be coordinated(each server cannot query hbase individually otherwise hbase will not know how to split the request among the servers). I'm confused but thought maybe there's either a native way to do this in hadoop?

If it helps, my application is running java and I'm running the cluster on EC2 using the cloudera distribution.

1 answers

HBase builds on Hadoop for a reason :) you can use Hadoop's map-reduce framework to distribute analytics and let hadoop/hbase take care of distributing the load. You can start with the docs to see what can be done.

Another option you have is to write co-processors. Coprocessors run on the region servers so they work close to the data. You can find a nice intro here

Statistical analysis of distributed data values in Java

Distributed Cluster Hadoop and Hbase

Distributed multimap based on HBase and Hadoop MapReduce

Hbase Pseudo distributed mode not run in localhost

How does HBase internally analysis “hbase shell command”?

How to run HBase in distributed mode on windows without cygwin?

HBase Column data types

Writing data to HBase

Load data into Hbase

HBase data persistence

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Statistical analysis of distributed data values in Java Distributed Cluster Hadoop and Hbase Distributed multimap based on HBase and Hadoop MapReduce Hbase Pseudo distributed mode not run in localhost How does HBase internally analysis “hbase shell command”? How to run HBase in distributed mode on windows without cygwin? HBase Column data types Writing data to HBase Load data into Hbase HBase data persistence

Related Tags

distributed analysis of hbase data

Question

1 answers

solution1 1 ACCPTED 2012-09-30 07:29:27

solution1
1 ACCPTED 2012-09-30 07:29:27