简体   繁体   中英

Best rowkey design for hbase

I am coming from a sql background and missing on some basic concept in hbase. I have my mysql data divided into 5 columns out of which I require two columns for data filtering. In sql, the query is straightforward and I can put an index on these two columns and can get data based on some range defined for these two columns in my where clause.

The data in these two columns are in a monotonically increasing manner like timestamp. What can be the best way to design that in hbase. I am considering to put timestamp as rowkey with certain measure for hotspoting. But for each query i need to put range filter in rowkey and then scan results and filter based on the second column. I am not sure if this is fast enough or not. So what is the hbase equivalent for indexing.

One more imp thing to note is that I only want to load data once and then perform only read requests.

Any help is highly appreciated.

RowKey need to be unique for you. Yeah you can time for that but I think you should be use other parameter with timestamp. For instance; timestamp+userId. This'll be safe for you. Think You have many hbase client and they write a hbase server. 2 client can want to write in hbase server same time.Of course you dont need to write all properties in your rowkey. This won't true.

It depends on what types of queries you will perform most often. If you mostly will need to filter one column than I would suggest you put this column together with a timestamp in row key. For, example:

rowkey = shardKey + column + timestamp

If you use both for filtering than

rowKet = shardKey + column1 + column2 +timestamp

In the first cases shardKey probably should be like hash(column) % number_of_regions , and in seccond hash(column1 + column2) % number_of_regions . Thus, you can always get time-series data for specific column1 and column2 combination. Or if you need both, consider having several tables, since you are going to write once.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM