简体   繁体   English

Google BigQuery 中的分区会提高连接性能吗?

[英]Will partitioning in Google BigQuery improve join performance?

I have a table with around 800k rows (which I didn't think is a lot).我有一个大约有 80 万行的表(我认为这不是很多)。 It is created from a series of other tables.它是根据一系列其他表创建的。 I am then joining this table with another table of about 5M rows (using the python client), but it appears to be taking forever.然后我将这个表与另一个大约 5M 行的表(使用 python 客户端)连接起来,但它似乎要花很长时间。 In the NoSQL and SQL world I would create an index.在 NoSQL 和 SQL 世界中,我将创建一个索引。 In BQ, I think this is a partition or can I create an Index.在 BQ 中,我认为这是一个分区或者我可以创建一个索引。

I'm using python and the following to create a table我正在使用 python 和以下内容创建一个表

query = """
CREATE OR REPLACE TABLE `{table_name}` AS
WITH get_all_affiliate AS (
""".format(table_name=table_name)

and

query += """
    ) SELECT * from get_all_table
    """

and then然后

response = client.query(query).result()

How can I easily CAST and also perform some indexing/partition on one field that is a string, but can be recast as an Integer?我如何轻松地 CAST 并在一个字符串字段上执行一些索引/分区,但可以重铸为 Integer?

As @Samuel mentioned in comments, Partition can be used to optimize a query in BigQuery.正如@Samuel 在评论中提到的,分区可用于优化 BigQuery 中的查询。 However, if both tables need to be joined, it does not help since JOIN will combine all of both tables' elements which contradicts the purpose of Partition.但是,如果两个表都需要连接,则无济于事,因为JOIN将合并两个表的所有元素,这与 Partition 的目的相矛盾。 For more information, you may refer to this documentation .有关更多信息,您可以参考此文档

You can use below for casting a string and recast as integer.您可以使用下面的字符串转换为 integer。

Cast(string_column_A as int64) as tempory_column_A

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.将答案发布为社区 wiki ,以造福于将来可能会遇到此用例的社区。

Feel free to edit this answer for additional information.请随意编辑此答案以获取更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM