简体   繁体   English

如何扫描hbase中的数值范围

[英]How to scan a numeric range in hbase

My row key in hbase is numbers with different length, like 1,2,3.....32423480, 32423481.. 我在hbase中的行键是不同长度的数字,如1,2,3 ..... 32423480,32423481 ..

When I use 我用的时候

scan 'table' {STARTROW => '1', ENDROW => '3'}  

to scan the table, I only want result with the row key 1,2,3, but it returns all the rows that start with 1,2,3, like 1003423,200034.. 扫描表,我只想要行1,2,3的结果,但它返回所有以1,2,3开头的行,如1003423,200034 ..

Is it possible to filter the row key range in numeric way use hbase shell or java api? 是否可以使用数字方式过滤行键范围使用hbase shell或java api?

Thanks 谢谢

I am more familiar with Apache Accumulo (another BigTable implementation) but I believe that HBase operates similarly. 我更熟悉Apache Accumulo(另一个BigTable实现),但我相信HBase的运行方式类似。

Keys are sorted lexicographically so as you've observed '11' sorts before '2'. 键按字典顺序排序,因此您在'2'之前观察到'11'排序。 Typically what you do is format the keys to force the sorting to make sense in your domain. 通常,您所做的是格式化键以强制排序在您的域中有意义。 So for instance, if you're keys max value is 99999 you could pad up to 5 characters. 因此,例如,如果您的键最大值为99999,则最多可以填充5个字符。

1  becomes 00001
2  becomes 00002
11 becomes 00011
etc

This way HBase will sort your keys according to the expected numeric ordering and you can scan for ranges like (00001, 00003). 这样HBase将根据预期的数字排序对您的键进行排序,您可以扫描像(00001,00003)这样的范围。

Looks like your keys in HBase table are stored as strings. 看起来你的HBase表中的键存储为字符串。 It means numbers like 1, 2, 3, etc are located in different parts of table and there are many another keys between them. 这意味着像1,2,3等数字位于表的不同部分,并且它们之间还有许多其他键。 So the answer to your question: it's not possible to scan the numeric range you want with the help of the only one scan operation. 所以问题的答案是:在单一扫描操作的帮助下,无法扫描所需的数值范围。

But you have two possible ways to solve your problem: 但是您有两种可能的方法来解决您的问题:

1) Change the schema of your keys. 1)更改密钥的架构。 Just convert your keys to integers and store them in HBase. 只需将您的密钥转换为整数并将其存储在HBase中。 This way your keys will be stored as 4-elements byte arrays (or 8-elements if you use long integers) and sorted in HBase exactly in numeric way. 这样,您的密钥将存储为4个元素的字节数组(如果使用长整数,则存储为8个元素),并以完全数字方式在HBase中进行排序。 This schema is memory efficient but isn't shell-friendly because in HBase shell you can type only string represented keys by default. 此模式具有内存效率,但不支持shell,因为在HBase shell中,默认情况下只能键入字符串表示的键。 If you want shell-friendly but not so memory efficient way you can use solution provided in jeff's answer. 如果你想要外壳友好但不是那么有效的内存方式,你可以使用jeff答案中提供的解决方案。

2) If you dont want to move all your data to the new key schema then you can use Get operations instead of Scan . 2)如果您不想将所有数据移动到新的密钥架构,则可以使用Get操作而不是Scan Just call get operation per every element in your range. 只需为您范围内的每个元素调用get操作。 Obviously this method much less efficient then one scan but it let you get all data you want without data transformation. 显然,这种方法比一次扫描效率低得多,但它可以让您在没有数据转换的情况下获得所需的所有数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM