简体   繁体   English

如何使用单个 API 调用扫描 HappyBase 中的行集?

[英]How can I scan over sets of rows in HappyBase with a single API call?

I want to scan a big-table for a list of IDs (or prefixes of IDs) (using Python HappyBase).我想扫描一个大表以获取 ID 列表(或 ID 前缀)(使用 Python HappyBase)。

Is there any way to do it on server side?有没有办法在服务器端做到这一点? That is, I'd like to send a list of start/stop rows to be scanned in one API call rather than performing a long series of API calls.也就是说,我想在一个 API 调用中发送要扫描的开始/停止行列表,而不是执行一长串 API 调用。

Here's an example.这是一个例子。 For my_big_tables keys:对于 my_big_tables 键:

2019/1
2019/2
2019/3
...
2020/1
2020/2
2020/3
2020/4
..

In one query, I'd like to get all the records from months 1 and 2 for all years.在一个查询中,我想获取所有年份的第 1 个月和第 2 个月的所有记录。 The results should be:结果应该是:

2019/1
2019/2
2020/1
2020/2

Rather than using the row_start and row_stop arguments in Table.scan(), this may be a better fit for the filter argument with a regular expression.与其在 Table.scan() 中使用row_startrow_stop arguments,这可能更适合带有正则表达式的filter参数。

See the API reference for details on the filter argument:有关过滤器参数的详细信息,请参阅API 参考

The keyword argument filter is also supported (beyond column and row range filters supported here).还支持关键字参数filter (此处支持的列和行范围过滤器除外)。 HappyBase / HBase users will have used this as an HBase filter string. HappyBase / HBase 用户将使用它作为 HBase 过滤器字符串。 (See the Thrift docs for more details on those filters.) However, Google Cloud Bigtable doesn't support those filter strings so a RowFilter should be used instead. (有关这些过滤器的更多详细信息,请参阅Thrift 文档。)但是,Google Cloud Bigtable 不支持这些过滤器字符串,因此应使用RowFilter

RowFilter is a type provided by Google's Bigtable library. RowFilter 是 Google 的 Bigtable 库提供的一种类型。 Here are the docs . 这是文档 Assuming that the ID field you're referring to is your row key, we can use RowKeyRegexFilter to filter the IDs by the pattern you've described.假设您所指的 ID 字段是您的行键,我们可以使用RowKeyRegexFilter根据您描述的模式过滤 ID。

We'll start by coming up with a regular expression to match a list of IDs for the desired months.我们将首先提出一个正则表达式来匹配所需月份的 ID 列表。 For example, if you wanted to filter year-based IDs for the months of December and January, you could use this (note that you must go from the largest number to the shortest) -- see this link to test the regular expression:例如,如果您想过滤 12 月和 1 月的基于年份的 ID,您可以使用它(请注意,您必须从最大数字到最短数字 go)——请参阅此链接以测试正则表达式:

\d\d\d\d\/(12|1)

Here's an attempt to write a function that creates a Google Bigtable HappyBase scan call with an appropriate filter, where table is a HappyBase table and months is a list of integers.这是编写 function 的尝试,它使用适当的过滤器创建 Google Bigtable HappyBase 扫描调用,其中table是 HappyBase 表, months是整数列表。 Please note that I have not tested this code, but hopefully it at least gives you a starting point.请注意,我没有测试过这段代码,但希望它至少能给你一个起点。

from google.cloud.bigtable.row_filters import RowKeyRegexFilter

def filter_by_months(table, months):
    months_reversed = sorted(months, reverse=True)
    months_strings = [str(month) for month in months_reversed]
    months_joined = "|".join(months_strings)

    key_filter = RowKeyRegexFilter('\d\d\d\d\/({})'.format(months_joined))
    return table.scan(filter=key_filter)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM