簡體   English   中英

大查詢:使用 Python 創建具有時間分區和聚類字段的表

[英]Big Query: Create table with time partitioning and clustering fields using Python

我可以在 Python 中成功創建一個大查詢表,如下所示:

from google.cloud import bigquery

bq_client = bigquery.Client()
table_name = "my_test_table"

dataset = bq_client.dataset("MY_TEST_DATASET")
table_ref = dataset.table(table_name)
table = bigquery.Table(table_ref)
table = bq_client.create_table(table)

后來我上傳本地 Pandas DataFrame 為:

# --- Define BQ options ---
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = "WRITE_APPEND"
job_config.source_format = bigquery.SourceFormat.CSV

# --- Load data ---
job = bq_client.load_table_from_dataframe(
        df, f"MY_TEST_DATASET.{table_name}", job_config=job_config
    )

在創建表和使用 Python 時如何指定:

  1. 按每日攝取時間划分
  2. 作為聚類字段 ["business_id", "software_house", "product_id"]

您可以使用以下Python腳本創建具有分區和集群功能的BigQuery表:

def create_table():
    from google.cloud import bigquery

    # Construct a BigQuery client object.
    client = bigquery.Client()

    table_id = "your_project.your_dataset.table_test"

    schema = [
        bigquery.SchemaField("business_id", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("software_house", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("product_id", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("other_field", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("ingestion_time", "TIMESTAMP", mode="NULLABLE"),
    ]

    table = bigquery.Table(table_id, schema=schema)

    # Clustering.
    table.clustering_fields = ["business_id", "software_house", "product_id"]

    # Partitioning.
    table.time_partitioning = bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY,
        field="ingestion_time",  # name of column to use for partitioning
        expiration_ms=7776000000
    )  # 90 days

    table = client.create_table(table)
    print(
        "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
    )


if __name__ == '__main__':
    create_table()

在這種情況下:

  • 每天在ingestion_time字段上添加分區
  • 在 ["business_id", "software_house", "product_id"] 字段上添加了一個集群 wad

這些文檔展示了如何在字段上添加分區和集群:

BigQuery中的結果是:

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM