简体   繁体   中英

hbase filters - does it perform well

In my case,we defined the row key for the init set of queries, we are querying against the row key and leave the column family and columns alone.

eg. Row Key is something like:

%userid%_%timestamp%

we are doing some queries like

select columnFamily{A,B,C} from userid=blabla and blabla < timestamp < blabla 

The performance is pretty ok, because that's what hbase is built for - row key look up.

But since the new requirement builds up, we will need to query against more fields: the columns. like:

select * from userid=blabla and blabla < timestamp < blabla and A=blabla and B=blabla and c=blabla

We started using hbase filters. We tried EqualFilter on one of the columns - A, it works ok from functionality point of view.

I have a general concern here, given the row key we have,

  1. can we just keep adding filters against all columns A,B,C to meet different query needs? Does number of the filters added in the hbase query slow down the reading performance?
  2. how dramatic is the impact if there is one?
  3. Can somebody explain to me how we should use the best of hbase filters from performance perspective?

1) can we just keep adding filters against all columns A,B,C to meet different query needs? Does
number of the filters added in the hbase query slow down the reading performance?

Yes you can do this. It will affect performance depending on the size of the data set and what filters you are using.

2) how dramatic is the impact if there is one?

The less data you return the better. You don't want to fetch data that you don't need. Filters help you return only the data that you need.

3) Can somebody explain to me how we should use the best of hbase filters from performance perspective?

It is best to use filters such as prefix-filters, filters that match exactly a specific value (or qualifier, column, etc), or does something like a greater-than/less-than type comparison to the data. These types of filters do not need to look at all the data in each row or table to return the proper results. Avoid regex filters because the regex expression must be performed on every piece of data that the filter is looking at, and that can be taxing over a large data set.

Also, Lars George, the author of the HBase book, mentioned that people are moving more toward coprocessors than toward filters. Might also want to look at coprocessors.

1) can we just keep adding filters against all columns A,B,C to meet different query needs? Does number of the filters added in the HBase query slow down the reading performance? -Yes, you can add the filter for all columns but it will surely affect the performance of your query if you having huge data stored. try to avoid the column filters because whenever you are adding any column filters ultimately you are increasing the number of comparisons based on columns.

2) how dramatic is the impact if there is one? -Filter helps you to recuce your resultset , so you will have required data only while fetching.

3) Can somebody explain to me how we should use the best of hbase filters from performance perspective? -In HBase rowFilter(it will include prefix) are most efficient filters because they don't need to look all record for that.So make your rowkey as it will include components on which you need to query frequently. -Value filters are most inefficient filters because it have to compare the values of the columns. -In HBase filters the sequence of filters matters, if you have multiple filters to be added to the filterlist then the sequence of the filters added will have impact on performance. I will explain with example If you need three different filters to be added to a query.Then when the first filter is applied the next filter will have the smaller data to be query on and after that same for third one.

So try to add efficient filter first ie.rowkey related filters and after that others

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM