简体   繁体   中英

HBase rowkey which includes timestamp

I would like to whether it is bad to have rowkeys like the following:

username-timestamp

This rows would be read from MapReduce jobs and will be put using java client API. Also, a subset would be selected using STARTROW, ENDROW.

On one side this seems convinient for my usecase since I can scan for specific interval and rows arebmostly subsequent for MR job, while on the other I read that it is good to avoid long rowkeys and hotspoting.

Is there really a problem with this design and how to overcome it?

I'm new to HBase so any help would be great.

The general advice is to avoid monotonically increasing row keys. To that purpose, some software tools include a so called "salt" to the row key, which hashes the keys across regions. A discussion can be found here: http://hbase.apache.org/0.94/book/rowkey.design.html . And here: https://phoenix.apache.org/salted.html . You can also look at Apache Trafodion http://trafodion.apache.org/ , which uses row key salting to distribute SQL-like primary keys.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM