HBase Row Key Split Algorithm

Question

I am trying to store some data for every phone number on Hbase. The row key I will be using is reverse(PhoneNumber) for better distribution as most of the number for a particular country will start with same country code leading to hot-spotting. I will me moving this data from mysql to Hbase.

I took a random sample of 1 million phone numbers and took 200 splits of UniformSplit and HexStringSplit -the two predefined String algorithms in Hbase.

With UniformSplit only 8 regions get the data. With HexStringSplit 81 regions get the data .

Is there any other Split Algorithm I can use or any other Strategy.

Answer 1

Possibly if you what to use one of this algorithm you should use another row key design. I can suggest following schema for this. Using md5 or some similar hash of phone number and use first several numbers as salt, in this case, row key will be

salt+phoneNumber

In this case, you will have a more uniform distribution for which you can apply one of default split algorithm.

Answer 2

I would generally agree with @alexander-kuznetsov, but using just md5 or hash won't solve the issue.

I would suggest following design:

rowKey = (phoneNumber % number_of_regions) + phoneNumber

here I assume that phone number is Long or Int. This will distribute row keys according to the number of regions. Also I usually pre-split table before starting to insert data with the following method from HBase Amin API :

void createTable(TableDescriptor desc,
             byte[] startKey,
             byte[] endKey,
             int numRegions)
      throws IOException

And I usually specify split policy too:

 tableDescriptor.setRegionSplitPolicyClassName(ConstantSizeRegionSplitPolicy.class.getName())

HBase Row Key Split Algorithm

Question

2 answers

solution1
0 2017-07-21 12:43:28

solution2
0 2017-07-26 13:32:46

HBase Row Key Split Algorithm

Question

2 answers

solution1 0 2017-07-21 12:43:28

solution2 0 2017-07-26 13:32:46

solution1
0 2017-07-21 12:43:28

solution2
0 2017-07-26 13:32:46