简体   繁体   中英

hbase rowkey design

I am moving from mysql to hbase due to increasing data.

I am designing rowkey for efficient access pattern.

I want to achieve 3 goals.

  1. Get all results of email address
  2. Get all results of email address + item_type
  3. Get all results of particular email address + item_id

I have 4 attributes to choose from

  1. user email
  2. reverse timestamp
  3. item_type
  4. item_id

What should my rowkey look like to get rows efficiently?

Thanks

Assuming your main access is by email you can have your main table key as email + reverse time + item_id (assuming item_id gives you uniqueness)

You can have an additional "index" table with email+item_type+reverse time+item_id and email+item_id as keys that maps to the first table (so retrieving by these is a two step process)

Maybe you are already headed in the right direction as far as concatenated row keys: in any case following comes to mind from your post:

Partitioning key likely consists of your reverse timestamp plus the most frequently queried natural key - would that be the email? Let us suppose so: then choose to make the prefix based on which of the two (reverse timestamp vs email) provides most balanced / non-skewed distribution of your data. That makes your region servers happier.

Choose based on better balanced distribution of records: reverse timestamp plus most frequently queried natural key eg reversetimestamp-email or email-reversetimestamp

In that manner you will avoid hot spotting on your region servers. .

To obtain good performance on the additional (secondary ) indexes, that is not "baked into" hbase yet: they have a design doc for it (look under SecondaryIndexing in the wiki).

But you can build your own a couple of ways:

a) use coprocessor to write the item_type as rowkey to separate tabole with a column containing the original (user_email-reverse timestamp (or vice-versa) fact table rowke

b) if disk space not issue and/or the rows are small, just go ahead and duplicate the entire row in the second (and third for the item-id case) tables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM