简体   繁体   中英

HBase Row Key Split Algorithm

I am trying to store some data for every phone number on Hbase. The row key I will be using is reverse(PhoneNumber) for better distribution as most of the number for a particular country will start with same country code leading to hot-spotting. I will me moving this data from mysql to Hbase.

I took a random sample of 1 million phone numbers and took 200 splits of UniformSplit and HexStringSplit -the two predefined String algorithms in Hbase.

With UniformSplit only 8 regions get the data. With HexStringSplit 81 regions get the data .

Is there any other Split Algorithm I can use or any other Strategy.

Possibly if you what to use one of this algorithm you should use another row key design. I can suggest following schema for this. Using md5 or some similar hash of phone number and use first several numbers as salt, in this case, row key will be

salt+phoneNumber

In this case, you will have a more uniform distribution for which you can apply one of default split algorithm.

I would generally agree with @alexander-kuznetsov, but using just md5 or hash won't solve the issue.

I would suggest following design:

rowKey = (phoneNumber % number_of_regions) + phoneNumber

here I assume that phone number is Long or Int. This will distribute row keys according to the number of regions. Also I usually pre-split table before starting to insert data with the following method from HBase Amin API :

void createTable(TableDescriptor desc,
             byte[] startKey,
             byte[] endKey,
             int numRegions)
      throws IOException

And I usually specify split policy too:

 tableDescriptor.setRegionSplitPolicyClassName(ConstantSizeRegionSplitPolicy.class.getName())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM