简体   繁体   English

HBase筛选器上的StructRowKey byte []数组键

[英]HBase Filter on StructRowKey byte[] Array key

HBase Filter by part of Row key HBase按行键过滤

This is my table (Key is byte[] using a StructRowKeyBuilder with FixedLengthByteWritable for 'a', IntWritable for the ID and LongWritable for the timestamp and contains basically all the info, value is just a counter) The key consists of an identifier (a or p), an id of variable length, a date with time in seconds and a couple of other ids after that (about which I don't really care as I want to filter for time) . 这是我的表(键是使用StructRowKeyBuilder的byte [],其中'a'为FixedLengthByteWritable,ID为IntWritable,时间戳为LongWritable,并且基本上包含所有信息,值只是一个计数器)键由一个标识符(一个或p),一个可变长度的ID,一个以秒为单位的时间日期以及其后的其他两个ID (我不太在乎,因为我想过滤时间)

KEY                             VALUE
a 13  2018-01-01T10:00:00 ...   1
a 13  2018-01-02T11:00:00 ...   1
a 13  2018-01-03T12:00:00 ...   1
a 13  2018-01-04T13:00:00 ...   1
a 15  2018-01-01T10:00:00 ...   1
a 15  2018-01-02T11:00:00 ...   1
a 15  2018-01-03T12:00:00 ...   1
a 123 2018-01-01T10:00:00 ...   1
a 123 2018-01-02T11:00:00 ...   1
a 123 2018-01-03T12:00:00 ...   1
a 123 2018-01-04T10:00:00 ...   1
...
p 13  2018-01-01T10:00:00 ...   1
p 13  2018-01-02T10:00:00 ...   1
p 13  2018-01-03T10:00:00 ...   1
p 666 2018-01-01T10:00:00 ...   1
...

I want to get all data for a specific time frame, say between 2018-01-01T10:00:00 and 2018-01-02T12:00:00 for all a's. 我想获取特定时间范围内的所有数据,例如在2018-01-01T10:00:00和2018-01-02T12:00:00之间获取所有a。

So, I tried with scan setting start and end row. 因此,我尝试使用扫描设置开始和结束行。

StartRow    **a 0 2018-01-01T10:00:00** 
EndRow      **a Integer.MAX_VALUE 2018-01-02T:12:00:01 (+1 second to make it inclusive)**

This did not give me the correct result, as it included everything between the two keys. 这没有给我正确的结果,因为它包含了两个键之间的所有内容。 So record 所以记录

KEY VALUE a 13 2018-01-04T13:00:00 ... 1 关键值a 13 2018-01-04T13:00:00 ... 1

was included as well. 也包括在内。 (Which makes sense) (这有道理)

Setting the start row to a 0 and the end row to an Integer. 将开始行设置a 0 ,将结束行设置为Integer。 MaxValue limits the number of rows returned to only a s. MaxValue限制传回的行数a秒。

How would I go about filtering these rows server side with HBase filters? 如何使用HBase过滤器过滤这些行服务器端? Since the keys are serialized to byte[] I have no clear idea on how to achieve this with filters. 由于密钥已序列化为byte [],因此我对如何使用过滤器实现这一目标尚无明确的想法。

Anyone who could point me in the right direction? 谁能指出我正确的方向? (or better yet provide some example code in java) (或者更好的是提供一些用Java编写的示例代码)

Some code (which unfortunately does not work as I want it to): 一些代码(不幸的是,它无法按我的意愿工作):

...
byte[] fromKey = Bytes.toBytes("a" + 0);
byte[] toKey = Bytes.toBytes("a" + Integer.MAX_VALUE);
Scan scan = new Scan(fromKey, toKey);

int minId = 0;
int maxId = Integer.MAX_VALUE;
final byte[] fromBytes = Bytes.toBytes("a" + minId + dateFromInMillis);
final BinaryPrefixComparator fromBinaryPrefixComparator = new BinaryPrefixComparator(fromBytes);
final Filter fromFilter = new RowFilter(CompareOp.GREATER_OR_EQUAL, fromBinaryPrefixComparator);

final byte[] toBytes = Bytes.toBytes("a" + maxId + dateFromInMillis);
final BinaryPrefixComparator toBinaryPrefixComparator = new BinaryPrefixComparator(toBytes);
final Filter toFilter = new RowFilter(CompareOp.LESS_OR_EQUAL, toBinaryPrefixComparator);

FilterList filterList= new FilterList(FilterList.Operator.MUST_PASS_ALL, fromFilter, toFilter);

scan.setFilter(filterList);
scanner = myTable.getScanner(scan);
...

I tried to emulate your problem using Phoenix, I am not sure how StructRowKeyBuilder creates and stores key but if you implement same using a delimited HBase key or using Phoenix composite you should be able to get correct results. 我尝试使用Phoenix来模拟您的问题,我不确定StructRowKeyBuilder如何创建和存储密钥,但是如果您使用定界的HBase密钥或使用Phoenix组合来实现相同的密钥,那么您应该能够获得正确的结果。

Here is what I did - 这是我所做的-

// Create table    
create table stackoverflow (
    id1 char(1) not null,
    id2 integer not null,
    eventdate Date not null,
    id3 varchar not null,
    id4 varchar not null,
    myvalue integer
    CONSTRAINT my_pk PRIMARY KEY (id1, id2, eventdate,id3, id4));

// add data
UPSERT INTO stackoverflow (id1, id2, eventdate,id3, id4, myvalue) VALUES('a', 13, '2018-01-01T10:00:00', 'dummy1', 'dummy2', 1);
.
.
.
UPSERT INTO stackoverflow (id1, id2, eventdate,id3, id4, myvalue) VALUES('p', 13, '2018-01-03T12:00:00', 'dummy1', 'dummy2', 1);
UPSERT INTO stackoverflow (id1, id2, eventdate,id3, id4, myvalue) VALUES('p', 666, '2018-01-01T10:00:00', 'dummy1', 'dummy2', 1);

Next created following query - 接下来创建以下查询-

select  * from stackoverflow where id1='a' and id2 between 0 and 2147483647 and eventdate between TO_DATE('2018-01-01T10:00:00') and TO_DATE('2018-01-02T12:00:01');

Here are my results, I can achieve same using HBase java API but in my case the composite key generated is concatenated string separated by '0' delimiter. 这是我的结果,我可以使用HBase Java API达到相同的结果,但是在我的情况下,生成的复合键是由'0'分隔符分隔的串联字符串。 TO me it looks like StructRowKeyBuilder is changing something because what you are trying to achieve is very normal usecase scenario. 对我来说,StructRowKeyBuilder似乎正在更改某些内容,因为您尝试实现的是非常正常的用例场景。

a    13   2018-01-01 10:00:00.000  dummy1  dummy2  1        
a    13   2018-01-02 11:00:00.000  dummy1  dummy2  1        
a    15   2018-01-01 10:00:00.000  dummy1  dummy2  1        
a    15   2018-01-02 11:00:00.000  dummy1  dummy2  1        
a    123  2018-01-01 10:00:00.000  dummy1  dummy2  1        
a    123  2018-01-02 11:00:00.000  dummy1  dummy2  1        

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM