简体   繁体   中英

HBase Filter on StructRowKey byte[] Array key

HBase Filter by part of Row key

This is my table (Key is byte[] using a StructRowKeyBuilder with FixedLengthByteWritable for 'a', IntWritable for the ID and LongWritable for the timestamp and contains basically all the info, value is just a counter) The key consists of an identifier (a or p), an id of variable length, a date with time in seconds and a couple of other ids after that (about which I don't really care as I want to filter for time) .

KEY                             VALUE
a 13  2018-01-01T10:00:00 ...   1
a 13  2018-01-02T11:00:00 ...   1
a 13  2018-01-03T12:00:00 ...   1
a 13  2018-01-04T13:00:00 ...   1
a 15  2018-01-01T10:00:00 ...   1
a 15  2018-01-02T11:00:00 ...   1
a 15  2018-01-03T12:00:00 ...   1
a 123 2018-01-01T10:00:00 ...   1
a 123 2018-01-02T11:00:00 ...   1
a 123 2018-01-03T12:00:00 ...   1
a 123 2018-01-04T10:00:00 ...   1
...
p 13  2018-01-01T10:00:00 ...   1
p 13  2018-01-02T10:00:00 ...   1
p 13  2018-01-03T10:00:00 ...   1
p 666 2018-01-01T10:00:00 ...   1
...

I want to get all data for a specific time frame, say between 2018-01-01T10:00:00 and 2018-01-02T12:00:00 for all a's.

So, I tried with scan setting start and end row.

StartRow    **a 0 2018-01-01T10:00:00** 
EndRow      **a Integer.MAX_VALUE 2018-01-02T:12:00:01 (+1 second to make it inclusive)**

This did not give me the correct result, as it included everything between the two keys. So record

KEY VALUE a 13 2018-01-04T13:00:00 ... 1

was included as well. (Which makes sense)

Setting the start row to a 0 and the end row to an Integer. MaxValue limits the number of rows returned to only a s.

How would I go about filtering these rows server side with HBase filters? Since the keys are serialized to byte[] I have no clear idea on how to achieve this with filters.

Anyone who could point me in the right direction? (or better yet provide some example code in java)

Some code (which unfortunately does not work as I want it to):

...
byte[] fromKey = Bytes.toBytes("a" + 0);
byte[] toKey = Bytes.toBytes("a" + Integer.MAX_VALUE);
Scan scan = new Scan(fromKey, toKey);

int minId = 0;
int maxId = Integer.MAX_VALUE;
final byte[] fromBytes = Bytes.toBytes("a" + minId + dateFromInMillis);
final BinaryPrefixComparator fromBinaryPrefixComparator = new BinaryPrefixComparator(fromBytes);
final Filter fromFilter = new RowFilter(CompareOp.GREATER_OR_EQUAL, fromBinaryPrefixComparator);

final byte[] toBytes = Bytes.toBytes("a" + maxId + dateFromInMillis);
final BinaryPrefixComparator toBinaryPrefixComparator = new BinaryPrefixComparator(toBytes);
final Filter toFilter = new RowFilter(CompareOp.LESS_OR_EQUAL, toBinaryPrefixComparator);

FilterList filterList= new FilterList(FilterList.Operator.MUST_PASS_ALL, fromFilter, toFilter);

scan.setFilter(filterList);
scanner = myTable.getScanner(scan);
...

I tried to emulate your problem using Phoenix, I am not sure how StructRowKeyBuilder creates and stores key but if you implement same using a delimited HBase key or using Phoenix composite you should be able to get correct results.

Here is what I did -

// Create table    
create table stackoverflow (
    id1 char(1) not null,
    id2 integer not null,
    eventdate Date not null,
    id3 varchar not null,
    id4 varchar not null,
    myvalue integer
    CONSTRAINT my_pk PRIMARY KEY (id1, id2, eventdate,id3, id4));

// add data
UPSERT INTO stackoverflow (id1, id2, eventdate,id3, id4, myvalue) VALUES('a', 13, '2018-01-01T10:00:00', 'dummy1', 'dummy2', 1);
.
.
.
UPSERT INTO stackoverflow (id1, id2, eventdate,id3, id4, myvalue) VALUES('p', 13, '2018-01-03T12:00:00', 'dummy1', 'dummy2', 1);
UPSERT INTO stackoverflow (id1, id2, eventdate,id3, id4, myvalue) VALUES('p', 666, '2018-01-01T10:00:00', 'dummy1', 'dummy2', 1);

Next created following query -

select  * from stackoverflow where id1='a' and id2 between 0 and 2147483647 and eventdate between TO_DATE('2018-01-01T10:00:00') and TO_DATE('2018-01-02T12:00:01');

Here are my results, I can achieve same using HBase java API but in my case the composite key generated is concatenated string separated by '0' delimiter. TO me it looks like StructRowKeyBuilder is changing something because what you are trying to achieve is very normal usecase scenario.

a    13   2018-01-01 10:00:00.000  dummy1  dummy2  1        
a    13   2018-01-02 11:00:00.000  dummy1  dummy2  1        
a    15   2018-01-01 10:00:00.000  dummy1  dummy2  1        
a    15   2018-01-02 11:00:00.000  dummy1  dummy2  1        
a    123  2018-01-01 10:00:00.000  dummy1  dummy2  1        
a    123  2018-01-02 11:00:00.000  dummy1  dummy2  1        

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM