简体   繁体   中英

How to skip rows in Hbase Scan?

I am implementing a simple pagination, like go to page 1, page 2, page 3 and so on.

In HBase Book I read that there is a PageFilter that has a constructor with one parameter that indicates the number of rows to return but the question is how to go, for example, to page 5 directly skipping pageSize*currentPageNumber rows? The example given in the HBase book seems like sequence pagination ie you can go to page 5 directly.

Is there a way to skip rows in HBase?

Thanks in advance.

The PageFilter doesn't provide any offset functionality, it works just like a limit clause, stopping the scan operation when you have enough data.

It's important to say that HBase doesn't know how many rows a table has, you have to scan the whole table in order to get that count. This alone, among other things, makes impossible to paginate the data (because you don't know the total page count or which is the offset of each row). Don't see it as a drawback, because this have a massive impact when you write tons of data.

Having said that, pagination over millions (or billions) of rows doesn't make sense. You should design your tables in a way that you can always provide a starting point (rowkey), so you scan operation can start reading from there. You don't need to know the whole row key, both start & stop rows can be just a prefix (ie: If your data is naturally sorted by a 8 byte long timestamp, you can use it to fast-forward to previous hours, days, months...).

If you cannot provide any starting point (even partially), a very simple solution that could work for you would be to retrieve the records in batches (ie: batches of 1000 items which could be enough for 50 pages that can be easily handled client-side). Then, when you have reached the last page of the batch, just use the rowkey of the last item as the starting for the next scan operation which should retrieve another batch of 1000 rows, and so on.... The only drawback is that it would be costly to go straight to higher pages, because you need to load the previous batches first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM