简体   繁体   中英

how to improve scan performance in hbase?

I am using hbase96 for analytic purpose. I am fetching data from hbase by applying single column value filters on the range of row keys by defining startRow and endRow.

It is taking 5-6 minutes for scanning 1500000 records for single request.It is not handling concurrent requests.
How can I improve performance of scanning in hbase?

We have 3 datanodes and 2 master nodes on amazone.

below is my code

Scan s = new Scan();
s.setCaching(10000);

s.setStartRow(Bytes.toBytes(start_date));
s.setStopRow(Bytes.toBytes(end_date));

FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL);

SingleColumnValueFilter filter = new SingleColumnValueFilter(
Bytes.toBytes("log"), Bytes.toBytes("ad_id"),
CompareOp.EQUAL, Bytes.toBytes(ad_id));
filters.addFilter(filter);

SingleColumnValueFilter filter = new SingleColumnValueFilter(
Bytes.toBytes("log"), Bytes.toBytes("advertiser_id"),
CompareOp.EQUAL, Bytes.toBytes(adver_id));
filters.addFilter(filter);

s.setFilter(filters);

ResultScanner rs = click_table.getScanner(s);

How above code can be used in coproccessor ?

If you want to scan based on Column values then below are best ways

  1. Solr (CDH Search) https://wiki.apache.org/solr/
  2. Hindex (coprocessor based approach) https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/10/30/coprocessor-based-secondary-index-on-hbase

Try setting scan.setCaching(100000) while performing queries. It specifies the number of rows that will be transmitted per RPC to the regionserver.

Edit: Also, try setting batch and buffer sizes depending on your network bandwidth. Each application has different structure and require different tuning parameters.. try to tweak these values for your data.

If performance is still the same.. try using parallel fetching of data. This might help.

HTH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM