我应该在HBase中使用prefixfilter或rowkey范围扫描

Question

I don't know why it's very slow if I use prefixfilter to query. 如果我使用prefixfilter进行查询，我不知道为什么它会很慢。 Can someone explain which is the best way to query HBase, thanks. 有人可以解释一下查询HBase的最佳方法，谢谢。

hbase(main):002:0> scan 'userlib',{FILTER=>org.apache.hadoop.hbase.filter.PrefixFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('0000115831F8'))}
ROW               COLUMN+CELL                                                                                                                                
0000115831F8001   column=track:aid, timestamp=1339121507633, value=aaa                                                                                       
1 row(s) in 41.0700 seconds

hbase(main):002:0> scan 'userlib',{STARTROW=>'0000115831F8',ENDROW=>'0000115831F9'}                                                                                        
ROW               COLUMN+CELL                                                                                                                                
0000115831F8001   column=track:aid, timestamp=1339121507633, value=aaa                                                                                       
1 row(s) in 0.1100 seconds

Answer 1

HBase filters - even row filters - are really slow, since in most cases these do a complete table scan, and then filter on those results. HBase过滤器 - 甚至行过滤器 - 非常慢，因为在大多数情况下，这些过滤器会进行完整的表扫描，然后对这些结果进行过滤。 Have a look at this discussion: http://grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters 看一下这个讨论： http ： //grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters

Row key range scans however, are indeed much faster - they do the equivalent of a filtered table scan. 但是，行键范围扫描确实要快得多 - 它们相当于过滤后的表扫描。 This is because the row keys are stored in sorted order (this is one of the basic guarantees of HBase, which is a BigTable-like solution), so the range scans on row keys are very fast. 这是因为行键按排序顺序存储（这是HBase的基本保证之一，这是类似BigTable的解决方案），因此行键上的范围扫描速度非常快。 More explanation here: http://www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-be-achieved-through-a-programming-language-such-as-Python-PHP-or-JSP 更多解释如下： http ： //www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-be-achieved-through-a-programming-language-such-as- Python的PHP-OR-JSP

[UPDATE 1] turns out that PrefixFilter does do a full table scan until it passes the prefix used in the filter (if it finds it). [更新1]事实证明，PrefixFilter会执行全表扫描，直到它通过过滤器中使用的前缀（如果找到它）。 The recommendation for fast performance using a PrefixFilter seems to be to specify a start_row parameter in addition to the PrefixFilter. 使用PrefixFilter的快速性能建议似乎是指定除 PrefixFilter 之外的start_row参数。 See related 2013 discussion on the hbase-user mailing list . 请参阅有关hbase-user邮件列表的2013年相关讨论。

[UPDATE 2, from @aaa90210] In regards to above update, there is now an efficient row prefix filter that is much faster than PrefixFilter, see this answer: https://stackoverflow.com/a/38632100/150050 [更新2，来自@ aaa90210]关于上述更新，现在有一个高效的行前缀过滤器比PrefixFilter快得多，请参阅以下答案： https ：//stackoverflow.com/a/38632100/150050

Answer 2

DATE: turns out that PrefixFilter does do a full table scan until it passes the prefix used in the filter (if it finds it). 日期：事实证明，PrefixFilter会执行全表扫描，直到它通过过滤器中使用的前缀（如果找到它）。 The recommendation for fast performance using a PrefixFilter seems to be to specify a start_row parameter in addition to the PrefixF 使用PrefixFilter的快速性能建议似乎是指定除PrefixF之外的start_row参数

我应该在HBase中使用prefixfilter或rowkey范围扫描

问题描述

2 个解决方案

解决方案1
16 已采纳 2012-06-08 19:31:56

解决方案2
0 2016-06-02 14:13:19

我应该在HBase中使用prefixfilter或rowkey范围扫描

问题描述

2 个解决方案

解决方案1 16 已采纳 2012-06-08 19:31:56

解决方案2 0 2016-06-02 14:13:19

解决方案1
16 已采纳 2012-06-08 19:31:56

解决方案2
0 2016-06-02 14:13:19