简体   繁体   English

HBase - 带偏移的值过滤器?

[英]HBase - value filter with offset?

Let say I have three rows with following values: 假设我有三行,其中包含以下值:

+------+----------+
| row  | f1:c1    |
+------+----------+
| row1 | \x00\x00 |
| row2 | \x01\x00 |
| row3 | \x03\x01 |
+------+----------+

Is it possible to select rows with \\x00 as second byte of the value (eg. row1 and row2 )? 是否可以选择使用\\x00作为值的第二个字节的行(例如, row1row2 )?

Further explanation 进一步说明

I have an immutable object that consists of couple UUID s as a part of my entity. 我有一个不可变对象,它由几个UUID组成,作为我实体的一部分。 Since UUID has fixed length the most efficient way of storing it is to concat all parts into a single byte array and store it in a single column. 由于UUID具有固定长度,因此最有效的存储方式是将所有部分连接成单个字节数组并将其存储在单个列中。

However, I must be able to select rows based on specific field of said object. 但是,我必须能够根据所述对象的特定字段选择行。 Which in theory is pretty simple: all I need to do is take my column value at specific offset and compare next 16 bytes against search value. 这在理论上非常简单:我需要做的就是将我的列值取特定的偏移量,并将接下来的16个字节与搜索值进行比较。

In fact ByteArrayComparable already kinda works that way. 事实上, ByteArrayComparable已经有点像这样。 It takes an offset that points to the start of the value, and it seems that all I need is to add additional offset on top of that. 它需要一个指向值开始的偏移量,似乎我只需要在其上添加额外的偏移量。 But I cannot figure out how to do that. 但我无法弄清楚如何做到这一点。

All in all, this seems to me like a widely applicable use case. 总而言之,在我看来,这似乎是一个广泛适用的用例。 So there must be a way to do it, unless I am missing something. 所以必须有办法做到这一点,除非我遗漏了一些东西。

PS I know that I can probably achieve what I want with RegexStringComparator but this seems wildly inefficient. PS我知道我可以用RegexStringComparator达到我想要的RegexStringComparator但这看起来非常低效。

UPDATE UPDATE

HBase supports custom filters which is perfect for my situation. HBase支持自定义过滤器,非常适合我的情况。 Unfortunately all documentation I can find seems outdated since hbase.dynamic.jars.dir and hbase.use.dynamic.jar are not present in my configuration (my version is 2.0.1). 不幸的是,由于hbase.dynamic.jars.dirhbase.use.dynamic.jar在我的配置中不存在(我的版本是2.0.1),因此我找到的所有文档都显得过时了。

UPDATE 2 更新2

I managed to solve this with custom filter. 我设法用自定义过滤器解决了这个问题。 It appears that they removed hbase.dynamic.jars.dir and hbase.use.dynamic.jar , but simply placing filter on the classpath works fine. 看来他们删除了hbase.dynamic.jars.dirhbase.use.dynamic.jar ,但只是在类路径上放置过滤器就可以了。

If anyone willing to write an answer on how to implement and use custom filter I will gladly award the bounty. 如果有人愿意写一个关于如何实现和使用自定义过滤器的答案,我将很乐意授予赏金。

Firstly, let's know more about custom filters to answer this sentence: 首先,让我们更多地了解自定义过滤器来回答这句话:

PS I know that I can probably achieve what I want with RegexStringComparator but this seems wildly inefficient. PS我知道我可以用RegexStringComparator达到我想要的效果,但这看起来非常低效。

custom filters can be used within a scan operation in HBase. 自定义过滤器可以在HBase的扫描操作中使用。 When such a scan operation executes from your application for example which is on spark the executor uses an RPC connection to connect to the underlying region server and the region server uses the same type of connection for fetching data from data-node. 当您的应用程序执行此类扫描操作(例如,处于spark状态)时,执行程序使用RPC连接来连接到基础区域服务器,而区域服务器使用相同类型的连接从数据节点获取数据。 But the question is that where the custom filters are applied? 但问题是应用自定义过滤器的位置? on your application? 在你的申请? Of course NO. 当然不。 custom filters are applied on rows at region servers and only the matched ones are coming up to your application. 自定义过滤器应用于区域服务器上的行,只有匹配的过滤器才能应用于您的应用程序。 Furthermore, it means that using these kinds of filters would help much in solving performance issues. 此外,这意味着使用这些类型的过滤器将有助于解决性能问题。

Secondly, if there is a need to select some rows based on their values you can use different kinds of filters but SingleColumnValueFilter may be much more useful in working with values. 其次,如果需要根据其值选择某些行,则可以使用不同类型的过滤器,但SingleColumnValueFilter在处理值时可能更有用。 A complete list of custom filters is presented here . 此处提供了完整的自定义过滤器列表。 Additionally, RegexStringComparator can be used as SingleColumnValueFilter comparator and here is an example: 另外,RegexStringComparator可以用作SingleColumnValueFilter比较器,这是一个例子:

RegexStringComparator regexStringComparator=
                new RegexStringComparator(regexPattern);
SingleColumnValueFilter singleColumnValueFilter=
                new SingleColumnValueFilter(family, qualifier, 
                               CompareOp.EQUAL, regexStringComparator);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM