大查詢：使用 Python 創建具有時間分區和聚類字段的表

Question

我可以在 Python 中成功創建一個大查詢表，如下所示：

from google.cloud import bigquery

bq_client = bigquery.Client()
table_name = "my_test_table"

dataset = bq_client.dataset("MY_TEST_DATASET")
table_ref = dataset.table(table_name)
table = bigquery.Table(table_ref)
table = bq_client.create_table(table)

后來我上傳本地 Pandas DataFrame 為：

# --- Define BQ options ---
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = "WRITE_APPEND"
job_config.source_format = bigquery.SourceFormat.CSV

# --- Load data ---
job = bq_client.load_table_from_dataframe(
        df, f"MY_TEST_DATASET.{table_name}", job_config=job_config
    )

在創建表和使用 Python 時如何指定：

按每日攝取時間划分
作為聚類字段 ["business_id", "software_house", "product_id"]

Answer 1

您可以使用以下Python腳本創建具有分區和集群功能的BigQuery表：

def create_table():
    from google.cloud import bigquery

    # Construct a BigQuery client object.
    client = bigquery.Client()

    table_id = "your_project.your_dataset.table_test"

    schema = [
        bigquery.SchemaField("business_id", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("software_house", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("product_id", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("other_field", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("ingestion_time", "TIMESTAMP", mode="NULLABLE"),
    ]

    table = bigquery.Table(table_id, schema=schema)

    # Clustering.
    table.clustering_fields = ["business_id", "software_house", "product_id"]

    # Partitioning.
    table.time_partitioning = bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY,
        field="ingestion_time",  # name of column to use for partitioning
        expiration_ms=7776000000
    )  # 90 days

    table = client.create_table(table)
    print(
        "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
    )


if __name__ == '__main__':
    create_table()

在這種情況下：

每天在ingestion_time字段上添加分區
在 ["business_id", "software_house", "product_id"] 字段上添加了一個集群 wad

這些文檔展示了如何在字段上添加分區和集群：

BQ表分區
BQ表聚類

BigQuery中的結果是：

大查詢：使用 Python 創建具有時間分區和聚類字段的表

問題描述

1 個解決方案

解決方案1
0 2022-12-29 14:02:20

大查詢：使用 Python 創建具有時間分區和聚類字段的表

問題描述

1 個解決方案

解決方案1 0 2022-12-29 14:02:20

解決方案1
0 2022-12-29 14:02:20