简体   繁体   English

HBase Shell前缀过滤器的速度比静态端点快100倍

[英]HBase Shell Almost 100x Faster Than Restful Endpoint For Prefix Filter

If I run a scan with a prefix filter on the HBase shell, I get a response in less than 1 second no matter what I use for a prefix. 如果在HBase Shell上使用前缀过滤器运行扫描,则无论我使用什么前缀,都可以在不到1秒的时间内得到响应。 (0 vs 9 or "a" vs "z" makes no difference in speed of response). (0对9或“ a”对“ z”对响应速度没有影响)。

However, when I make the same query from the Microsoft HBase library (in C#), it can take up to 90 seconds to get an answer. 但是,当我从Microsoft HBase库(在C#中)进行相同的查询时,最多可能需要90秒才能获得答案。 Interestingly, if I pick a prefix closer to 0, it's faster, the further I move from 0, the longer it takes. 有趣的是,如果我选择一个更接近于0的前缀,则它越快,离0越远,花费的时间就越长。 ("a" is quicker than "f" as a prefix filter). (作为前缀过滤器,“ a”比“ f”要快)。

Not sure how to determine what the shell is doing differently than the restful query or how to make the restful query more performant. 不知道如何确定外壳程序与静态查询的操作有所不同,或者如何使静态查询更高效。

Some details: 一些细节:

  • There are a little over 20,000,000 records in this table 该表中有20,000,000条记录
  • The row key is designed as [guid]_[inverse timestamp], eg a6fc9620-5ff0-41c0-9ed9-660bc3fbb65c_9223370501253811889 行密钥设计为[guid] _ [反向时间戳记],例如a6fc9620-5ff0-41c0-9ed9-660bc3fbb65c_9223370501253811889

Any thoughts of what I should be looking for or trying to improve the rest api request? 对我应该寻找或试图改善其余api请求的任何想法吗?

Turns out this is a non-issue. 原来这不是问题。 I wasn't running the same commands on the shell vs the rest API like I thought. 我没有像我想的那样在Shell和其余API上运行相同的命令。

On the rest API, I was giving two filters, a page filter and a prefix filter. 在其余的API上,我提供了两个过滤器,一个页面过滤器和一个前缀过滤器。

On the HBase shell I was running 在HBase Shell上,我正在运行

scan 'beacon', {STARTROW => 'ff', FILTER => "PageFilter(25)"}

The STARTROW isn't the same as a prefix filter. STARTROW与前缀过滤器不同。 It is actually doing something more like setting a full beginning row key, and thus make the scan performant as it's not traversing the whole table. 实际上,它做的更多事情是设置完整的开始行键,从而使扫描性能更高,因为它没有遍历整个表。

Turns out, this is what I should have been doing in the rest API call too. 事实证明,这也是我在其余API调用中应该做的事情。 When I set a start and end row in addition to a prefix filter and page filter, it works quickly and as expected. 当我除了设置前缀过滤器和页面过滤器之外,还设置了开始和结束行时,它可以按预期快速运行。

https://community.hortonworks.com/articles/55204/recommended-way-to-do-hbase-prefix-scan-through-hb.html https://community.hortonworks.com/articles/55204/recommended-way-to-do-hbase-prefix-scan-through-hb.html

Should I use prefixfilter or rowkey range scan in HBase 我应该在HBase中使用prefixfilter还是行键范围扫描

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM