[英]Bigquery data transfer failing if target table is not daily-partitioned
I have a Bigquery data transfer job setup to a destination table which is partitioned by month.我有一个 Bigquery 数据传输作业设置到按月分区的目标表。 The table has been created with the following command:该表已使用以下命令创建:
bq mk --table \
--schema schema.json \
--time_partitioning_field createdAt \
--time_partitioning_type MONTH \
myproject:mydataset.MyTable
The datatransfer job has been created with the Python BQDTS client, like this:数据传输作业已使用 Python BQDTS 客户端创建,如下所示:
parent = f"projects/myproject/locations/{location}"
baseparams = {
"file_format": "CSV",
"ignore_unknown_values": True,
"field_delimiter": ",",
"skip_leading_rows": "0",
"allow_jagged_rows": True,
}
params = Struct()
params_content = baseparams.copy()
params_content[
"data_path_template"
] = f"gs://mybucket/**/*.csv"
params_content["destination_table_name_template"] = "MyTable"
params.update(params_content)
tc_dict = {
"display_name": target_display_name,
"destination_dataset_id": "mydataset",
"data_source_id": "google_cloud_storage",
"schedule": "every 24 hours",
"params": params,
}
tc = bigquery_datatransfer_v1.types.TransferConfig(**tc_dict)
response = client.create_transfer_config(
request={"parent": parent, "transfer_config": tc}
)
As you can see, there is no partitioning specified in the job definition, it is only specified in the database table, as should be according to the documentation :如您所见,作业定义中没有指定分区,它仅在数据库表中指定,根据文档应该如此:
Partitioning options Cloud Storage and Amazon S3 transfers can write to partitioned or non-partitioned destination tables.分区选项 Cloud Storage 和 Amazon S3 传输可以写入分区或非分区目标表。 There are two types of table partitioning in BigQuery: BigQuery 中有两种类型的表分区:
Partitioned tables: Tables that are partitioned based on a column.分区表:根据列进行分区的表。 The column type must be a TIMESTAMP or DATE column.列类型必须是 TIMESTAMP 或 DATE 列。 If the destination table is partitioned on a column, you identify the partitioning column when you create the destination table and specify its schema.如果目标表按列分区,则在创建目标表并指定其模式时识别分区列。
This job has been running successfully for days, until last week (Last successful run on 2020-11-04).该作业已成功运行数天,直到上周(上次成功运行时间为 2020-11-04)。 This night (2020-11-10), the job failed with the following error message:今天晚上(2020-11-10),作业失败并显示以下错误消息:
Incompatible table partitioning specification.不兼容的表分区规范。 Destination table exists with partitioning specification interval(type:MONTH,field:createdAt), but transfer target partitioning specification is interval(type:DAY,field:createdAt).目标表存在分区规格间隔(type:MONTH,field:createdAt),但传输目标分区规格为间隔(type:DAY,field:createdAt)。 Please retry after updating either the destination table or the transfer partitioning specification.请在更新目标表或传输分区规范后重试。
I have tried to recreate tables and jobs with such specification and it indeed fails everytime the destination table partitioning type is MONTH.我试图用这样的规范重新创建表和作业,但每次目标表分区类型为 MONTH 时它确实失败了。 However, this still work if the partitioning type is DAY.但是,如果分区类型是 DAY,这仍然有效。 What confuses me the most is the message "the transfer partitioning specification" as such a parameter it doesn't seem to exist in the documentation.最让我困惑的是消息“传输分区规范”作为一个参数,它似乎不存在于文档中。
Is it a recent API breaking change in GCP which has not been documented yet?它是 GCP 中最近的 API 重大更改,尚未记录在案吗?
After a few weeks of investigation and bug fixing on the GCP Team side, the problem has been solved since December 7th, 2020. It was indeed a bug in the Big Query Transfer service.经过 GCP Team 几周的调查和 bug 修复,问题从 2020 年 12 月 7 日开始得到解决。确实是 Big Query Transfer 服务中的一个 bug。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.