简体   繁体   中英

HBase column wide scanning and fetching

Let's say i've created a table

rowkey (attrId+attr_value) //compound key

column => doc:doc1, doc:doc2, ...

when use scan feature, i would fetch 1 row every time inside iterator, what if the column qualifier reach millions entries. how do you loop through that, and will there be a cache issue?

thanks.

You can workaround giant row fetches with a mixture of scans and column filters:

Scan s = ...;
s.setStartRow("some-row-key");
s.setStopRow("some-row-key");
Filter f = new ColumnRangeFilter(Bytes.toBytes("doc0000"), true,
                                 Bytes.toBytes("doc0100"), false);
s.setFilter(f);

Source: http://hadoop-hbase.blogspot.com/2012/01/hbase-intra-row-scanning.html

Scans fetch rows. You can qualify a scan so that it only fetches given qualifiers or given families, but then that is all that will be returned from the scan (and you can only filter on data that is included in a scan).

If you have potentially millions of columns in a single row, that could be an issue: that means that returning that row could be a very large network transfer. If your row size exceeds your region size it could also cause OOM errors on your region servers, and you will have inefficient storage (one row per region).

However, ignoring all of that, you can loop through the columns and column qualifiers in the client.You can get a Map from the result set that maps from families to qualifiers to values. But that is probably not what you really want to do

您还可以通过Scan.setBatch限制一次返回的行中的列数。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM