简体   繁体   中英

How to increase scan speed in Hbase

I am new to Apache Hbase and I am using hbase-0.98.13 and I have created a table sample with column family sample_family. And I have loaded the output from pig script to hbase table. when I try to scan the table based on one of the column in column family it takes more than 2 minutes.

Here is the query

scan 'sample', {FILTER=>"SingleColumnValueFilter('sample_family','id',=,'binary:1000')"}

Can any one tell me how to bring this process in one or two seconds?

Is there any configuration changes to be made for this? Can any one help me in this?

There's no silver bullet to make a search in HBase fast. A scan in your example has to iterate over all the rows in a table, that's why it takes significant time on large tables. And there are no secondary indices in HBase that help to improve a search by specific columns.

The most effective way to improve scans perfomance is to have properly designed row keys. HBase internally keeps rows sorted by row keys, and you can specify start and end rows for a scan. So it's crucial to have row keys designed for search by the most frequent criteria. In your question you search by column id where a value is 1000 . You could put this id into the row key (however, you have to make sure you avoid regions hotspotting).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM