简体繁体中英

What is best approach creating multiple hbase tables or multiple column families in single hbase table

原文 2014-06-09 13:11:20 6 3 hadoop/ hbase

My hbase row key is different and also I need to aggregate the data and store seperatly. In this use case which one is best approach

What is best approach creating multiple hbase tables or multiple column families in single hbase table

I am Refining my question

Below is my usecase.

I am processing weblogs which has retailer, Category, Product clicks.

I am storing above weblog into one hbase table (Log) with separate rowkey and same column family Ex.
- A.
for Retailer -- IP | DateTime | Sid | Retailer
- B.
for Category -- IP | DateTime | Sid | Retailer | Category
- C.
for Product -- IP | DateTime | Sid | Retailer | Category |Product
From above table I am calculating Day clicks and storing into other hbase tables like ( Retailer_Day_cnt, Category_Day_Cnt, Product_Day_Cnt)

Here my question is what is the best way to store the data into hbase with above 1 and 2 cases, is it separate hbase tables or column family.

Note: In case1 I am doing only writes, but in case2 I will do multiple reads and writes.

Thanks in advance Surendra

3 answers

From performance perspective, lesser the column families better it is. As all the column families in table are flushed at same time even if some of the column families have very little data, making flush less efficient. . If your table is heavy on write this will result lot hfiles -> increased in compactions -> increased GC pauses, this can make whole hbase very slow so better don't use multiple column family if you don't really need them or all column families will have same amount data.

Find more details here: Hbase Book

Similar question

This depends on you use case.

In case you have the same rowKey but different data then you can divide into different column families. But if the rowkeys are different put it into different tables.

This also will depend on whether you have single write multiple reads (ie low write throughput is ok) or you want high write throughput. Also how you data is dictributed. If one column family has a lot of data (in size) compared to rest of column families better to put the column families into different tables.

If you give more details on your use case i can be more specific.

Row key design is the main challenge in these scenarios. If you are able to make your row key in such a way so that you can use it for all of your purposes then you may proceed with different column families otherwise multiple tables would be the only option. For your case, it seems like you are storing aggregated result in the second table which must have different logical row key. So, you should go with two tables approach where first table to store all the inputs (write once read multiple times) and second table to store processed/aggregated data.

HBase multiple column families performance

Import multiple column families from hbase to hive

How to put values inside multiple column families in hbase

Hadoop Hbase: Spreading column families across tables or not

HBase: Create multiple tables or single table with many columns?

what is more efficient in performance of hbase,multiple tables of same structure or a single table containing large set of data?

If it is ok to create a hbase table with 300 column families?

Design for columns and column families in Hbase

Approach to upload multiple interconnected csv files to HBase

HBase multiple table scans for the job

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question HBase multiple column families performance Import multiple column families from hbase to hive How to put values inside multiple column families in hbase Hadoop Hbase: Spreading column families across tables or not HBase: Create multiple tables or single table with many columns? what is more efficient in performance of hbase,multiple tables of same structure or a single table containing large set of data? If it is ok to create a hbase table with 300 column families? Design for columns and column families in Hbase Approach to upload multiple interconnected csv files to HBase HBase multiple table scans for the job

Related Tags

What is best approach creating multiple hbase tables or multiple column families in single hbase table

Question

3 answers

solution1
1 2015-07-14 14:30:24

solution2
0 2014-06-09 15:15:41

solution3
0 2014-06-10 09:30:35

What is best approach creating multiple hbase tables or multiple column families in single hbase table

Question

3 answers

solution1 1 2015-07-14 14:30:24

solution2 0 2014-06-09 15:15:41

solution3 0 2014-06-10 09:30:35

solution1
1 2015-07-14 14:30:24

solution2
0 2014-06-09 15:15:41

solution3
0 2014-06-10 09:30:35