如何通过行键的一部分过滤 HBase 的扫描？

Question

I have HBase table with row keys, which consist of text ID and timestamp, like next:我有一个带有行键的 HBase 表，它由文本 ID 和时间戳组成，如下所示：

...
string_id1.1470913344067
string_id1.1470913345067
string_id2.1470913344067
string_id2.1470913345067
...

How can I filter Scan of HBase (in Scala or Java) to get results with some string ID and timestamp more than some value?如何过滤 HBase 的扫描（在 Scala 或 Java 中）以获取具有某些字符串 ID 和时间戳的结果而不是某个值？

Thanks谢谢

Answer 1

Fuzzy row approach is efficient for this kind of requirement and when data is is huge : As explained by this article FuzzyRowFilter takes as parameters row key and a mask info.模糊行方法对于这种需求是有效的，并且当数据很大时：正如本文所解释的那样 FuzzyRowFilter 将行键和掩码信息作为参数。

In example above, in case we want to find last logged in users and row key format is userId_actionId_timestamp (where userId has fixed length of say 4 chars), the fuzzy row key we are looking for is ????_login_ .在上面的例子中，如果我们想找到最后登录的用户并且行键格式是userId_actionId_timestamp （其中userId有固定长度的 4 个字符），我们要查找的模糊行键是????_login_ 。 This translates into the following params for FuzzyRowKey:这转化为 FuzzyRowKey 的以下参数：

FuzzyRowFilter rowFilter = new FuzzyRowFilter(
 Arrays.asList(
  new Pair<byte[], byte[]>(
    Bytes.toBytesBinary("\x00\x00\x00\x00_login_"),
    new byte[] {1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0})));

Would suggest to go through hbase-the-definitive guide --> Client API: Advanced Features建议阅读 hbase-the-definitive 指南 --> 客户端 API：高级功能

Answer 2

Lets say you somehow ended up having your lines in a monadic traversable structure like List or RDD.假设你以某种方式最终将你的行放在一个像 List 或 RDD 这样的 monadic 可遍历结构中。 Now, you want to have only the strings with id = "string_id2" and timestamp > 1470913345000 .现在，您只想拥有id = "string_id2"和timestamp > 1470913345000 。

Now what is the problem here ?现在这里有什么问题？ Just filter you traversable monadic structure on these two criteria.只需根据这两个标准过滤可遍历的单子结构即可。

val filtered = listOrRddOfLines
  .map(l => {
    val idStr :: timestampStr :: Nil = l.split('.').toList
    (idStr, timestampStr.toLong)
  })
  .filter({
    case (idStr, timestamp) => idStr.equals("string_id2") && (timestamp > "1470913345000".toLong)
  })

Answer 3

I resolve my problem by using to filters:我通过使用过滤器解决了我的问题：
- PrefixFilter (I put to this filter first part of row key. In my case - string ID, for example "string_id1.") - PrefixFilter （我将行键的第一部分放入此过滤器。在我的情况下 - 字符串 ID，例如“string_id1”。）
- RowFilter (I put there two parametres: first - CompareOp.GREATER_OR_EQUAL , second - all my row key with necessary timestamp, for example "string_id1.1470913345000" - RowFilter （我放了两个参数：第一个 - CompareOp.GREATER_OR_EQUAL ，第二个 - 我所有的行键都带有必要的时间戳，例如“string_id1.1470913345000”

In result I get all cells with row key, which has necessary string_id if first part, and with timestamp more or equal than I put in filter in second part.结果，我得到了所有带有行键的单元格，如果第一部分它具有必要的string_id ，并且时间戳大于或等于我在第二部分中放入的过滤器。 It is exactly what I want.这正是我想要的。

Code snippet:代码片段：

val s = new Scan()
s.addFamily(family.getBytes)
val filterList = new FilterList()
filterList.addFilter(new PrefixFilter(Bytes.toBytes(prefixOfRowKey)))
filterList.addFilter(new RowFilter(CompareOp.GREATER_OR_EQUAL, new BinaryComparator(valueForBinaryFilter.getBytes())))
s.setFilter(filterList)
val scanner = table.getScanner(s)

Thanks to everyone who helped to find a solution.感谢所有帮助找到解决方案的人。

如何通过行键的一部分过滤 HBase 的扫描？

问题描述

3 个解决方案

解决方案1
5 2016-08-16 05:34:26

解决方案2
-2 2016-08-11 14:55:09

解决方案3
-2 已采纳 2016-08-12 09:42:02

如何通过行键的一部分过滤 HBase 的扫描？

问题描述

3 个解决方案

解决方案1 5 2016-08-16 05:34:26

解决方案2 -2 2016-08-11 14:55:09

解决方案3 -2 已采纳 2016-08-12 09:42:02

解决方案1
5 2016-08-16 05:34:26

解决方案2
-2 2016-08-11 14:55:09

解决方案3
-2 已采纳 2016-08-12 09:42:02