简体   繁体   中英

Will partitioning in Google BigQuery improve join performance?

I have a table with around 800k rows (which I didn't think is a lot). It is created from a series of other tables. I am then joining this table with another table of about 5M rows (using the python client), but it appears to be taking forever. In the NoSQL and SQL world I would create an index. In BQ, I think this is a partition or can I create an Index.

I'm using python and the following to create a table

query = """
CREATE OR REPLACE TABLE `{table_name}` AS
WITH get_all_affiliate AS (
""".format(table_name=table_name)

and

query += """
    ) SELECT * from get_all_table
    """

and then

response = client.query(query).result()

How can I easily CAST and also perform some indexing/partition on one field that is a string, but can be recast as an Integer?

As @Samuel mentioned in comments, Partition can be used to optimize a query in BigQuery. However, if both tables need to be joined, it does not help since JOIN will combine all of both tables' elements which contradicts the purpose of Partition. For more information, you may refer to this documentation .

You can use below for casting a string and recast as integer.

Cast(string_column_A as int64) as tempory_column_A

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

Feel free to edit this answer for additional information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM