[英]Big Query: Create table with time partitioning and clustering fields using Python
我可以在 Python 中成功創建一個大查詢表,如下所示:
from google.cloud import bigquery
bq_client = bigquery.Client()
table_name = "my_test_table"
dataset = bq_client.dataset("MY_TEST_DATASET")
table_ref = dataset.table(table_name)
table = bigquery.Table(table_ref)
table = bq_client.create_table(table)
后來我上傳本地 Pandas DataFrame 為:
# --- Define BQ options ---
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = "WRITE_APPEND"
job_config.source_format = bigquery.SourceFormat.CSV
# --- Load data ---
job = bq_client.load_table_from_dataframe(
df, f"MY_TEST_DATASET.{table_name}", job_config=job_config
)
在創建表和使用 Python 時如何指定:
您可以使用以下Python
腳本創建具有分區和集群功能的BigQuery
表:
def create_table():
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
table_id = "your_project.your_dataset.table_test"
schema = [
bigquery.SchemaField("business_id", "STRING", mode="NULLABLE"),
bigquery.SchemaField("software_house", "STRING", mode="NULLABLE"),
bigquery.SchemaField("product_id", "STRING", mode="NULLABLE"),
bigquery.SchemaField("other_field", "STRING", mode="NULLABLE"),
bigquery.SchemaField("ingestion_time", "TIMESTAMP", mode="NULLABLE"),
]
table = bigquery.Table(table_id, schema=schema)
# Clustering.
table.clustering_fields = ["business_id", "software_house", "product_id"]
# Partitioning.
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="ingestion_time", # name of column to use for partitioning
expiration_ms=7776000000
) # 90 days
table = client.create_table(table)
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
if __name__ == '__main__':
create_table()
在這種情況下:
ingestion_time
字段上添加分區這些文檔展示了如何在字段上添加分區和集群:
BigQuery
中的結果是:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.