[英]How to Insert a partition into BigQuery's fetch time partitioned table in Python by specifying a partition
如何使用 Python 指定获取时间分区表中的分区进行获取?
我发现在 SQL 中插入时可能会出现以下情况。 https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables
但我不知道如何在 Python 中描述它。 我正在考虑使用 google-cloud-bigquery 模块中的“client.load_table_from_dataframe”。 https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.load_table_from_dataframe
我找到了以下文档,但是当我使用名称_PARTITIONTIME
时出现以下错误。 https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-partitioned#bigquery_load_table_partitioned-python
google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/aaa/jobs?uploadType=multipart: Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, _TABLE_, _FILE_, _ROW_TIMESTAMP, __ROOT__ and _COLIDENTIFIER
CREATE TABLE IF NOT EXISTS `aaa.bbb.ccc`(
c1 INTEGER,
c2 STRING
)
PARTITION BY _PARTITIONDATE;
INSERT INTO `aaa.bbb.ccc` (c1, c2, _PARTITIONTIME) VALUES (99, "zz", TIMESTAMP("2000-01-02"));
import pandas as pd
from google.cloud import bigquery
from google.cloud.bigquery.enums import SqlTypeNames
from google.cloud.bigquery.job import WriteDisposition
from datetime import datetime
client = bigquery.Client(project="aaa")
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("c1", SqlTypeNames.INTEGER),
bigquery.SchemaField("c2", SqlTypeNames.STRING),
bigquery.SchemaField("_PARTITIONTIME", SqlTypeNames.TIMESTAMP),
],
write_disposition=WriteDisposition.WRITE_APPEND,
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="_PARTITIONTIME", # Name of the column to use for partitioning.
expiration_ms=7776000000, # 90 days.
),
)
df = pd.DataFrame(
[
[1, "a", datetime.strptime("2100-11-12", "%Y-%m-%d")],
[2, "b", datetime.strptime("2101-12-13", "%Y-%m-%d")],
],
columns=["c1", "c2", "_PARTITIONTIME"],
)
job = client.load_table_from_dataframe(df, "aaa.bbb.ccc", job_config=job_config) # error
result = job.result()
我们还提出以下问题。 https://ja.stackoverflow.com/questions/90760
您可以将命名约定_PARTITIONTIME
更改为另一个名称,因为它是区分大小写的前缀的一部分。 下面的代码有效:
import pandas as pd
from google.cloud import bigquery
from google.cloud.bigquery.enums import SqlTypeNames
from google.cloud.bigquery.job import WriteDisposition
from datetime import datetime
client = bigquery.Client(project="<your-project>")
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("c1", SqlTypeNames.INTEGER),
bigquery.SchemaField("c2", SqlTypeNames.STRING),
bigquery.SchemaField("_P1", SqlTypeNames.TIMESTAMP),
],
write_disposition=WriteDisposition.WRITE_APPEND,
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="_P1", # Name of the column to use for partitioning.
expiration_ms=7776000000, # 90 days.
),
)
df = pd.DataFrame(
[
[1, "a", datetime.strptime("2100-11-12", "%Y-%m-%d")],
[2, "b", datetime.strptime("2101-12-13", "%Y-%m-%d")],
],
columns=["c1", "c2", "_P1"],
)
job = client.load_table_from_dataframe(df, "<your-project>.<your-dataset>.ccc", job_config=job_config) # error
result = job.result()
Output:
至于要插入的查询:
INSERT INTO `<your-project>.<your-dataset>.ccc` (c1, c2, _P1) VALUES (99, "zz", TIMESTAMP("2000-01-02"));
正如 Googler 回答的这篇SO 帖子中所解释的那样,这是不可能的。 由于在expiration_ms
字段中,我们声明到期时间为 90 天,因此当前日期前 90 天(执行 python 脚本的日期)是有效日期,超出此日期的任何内容均无效。 此查询将起作用:
INSERT INTO `<your-project>.<your-dataset>.ccc` (c1, c2, _P1) VALUES (99, "zz", TIMESTAMP("2022-06-01"));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.