简体   繁体   中英

Hbase RowKey design schema

I am using HBase to store webtable content like how google is using bigtable.
For reference of google bigtable
My question is on RowKey , how we should be forming it.
What google is doing is saving the URL in a reverse order as you can see in the PDF document "com.cnn.www" so that all the links associated with cnn.com will be manages in same block of GFS which will be lot easier to scan.
I can use the same thing as google is using but wont it will be cool if I use some algorithm to compress the url

For eg.

RewKey                               |  Google Bigtable                      |  Algorithm output
www.cnn.com/index.php                |  com.cnn.www/index.php                |  12as/435
www.cnn.com/news/business/index.html |  com.cnn.www/news/business/index.html |  12as/2as/dcx/asd
www.cnn.com/news/sports/index.html   |  com.cnn.www/news/sports/index.html   |  12as/2as/eds/scf

Reason behind doing this is rowkey will be shorter as per the Hbase design schema (Mentioned in topic 6.3.2.3. Rowkey Length ).

So what do I need from you guys is to know am I correct over here....
Also if I am correct what Algorithm I should using. I am using python over thrift as a programming language so code will be overwhelming for me...

当您缩短URI时,请分别针对主机和路径进行操作并进行连接,这样您的密钥将类似于hostHash!pathHash,一方面可以简化它,另一方面将来自同一站点的所有URI分组在一起

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM