简体   繁体   English

如果目标表不是每日分区,则 Bigquery 数据传输失败

[英]Bigquery data transfer failing if target table is not daily-partitioned

I have a Bigquery data transfer job setup to a destination table which is partitioned by month.我有一个 Bigquery 数据传输作业设置到按月分区的目标表。 The table has been created with the following command:该表已使用以下命令创建:

bq mk --table \                                                                              
  --schema schema.json \
  --time_partitioning_field createdAt \
  --time_partitioning_type MONTH \
  myproject:mydataset.MyTable

The datatransfer job has been created with the Python BQDTS client, like this:数据传输作业已使用 Python BQDTS 客户端创建,如下所示:

parent = f"projects/myproject/locations/{location}"
baseparams = {
    "file_format": "CSV",
    "ignore_unknown_values": True,
    "field_delimiter": ",",
    "skip_leading_rows": "0",
    "allow_jagged_rows": True,
}
params = Struct()
params_content = baseparams.copy()
params_content[
    "data_path_template"
] = f"gs://mybucket/**/*.csv"
params_content["destination_table_name_template"] = "MyTable"

params.update(params_content)
tc_dict = {
    "display_name": target_display_name,
    "destination_dataset_id": "mydataset",
    "data_source_id": "google_cloud_storage",
    "schedule": "every 24 hours",
    "params": params,
}
tc = bigquery_datatransfer_v1.types.TransferConfig(**tc_dict)
response = client.create_transfer_config(
    request={"parent": parent, "transfer_config": tc}
)

As you can see, there is no partitioning specified in the job definition, it is only specified in the database table, as should be according to the documentation :如您所见,作业定义中没有指定分区,它仅在数据库表中指定,根据文档应该如此:

Partitioning options Cloud Storage and Amazon S3 transfers can write to partitioned or non-partitioned destination tables.分区选项 Cloud Storage 和 Amazon S3 传输可以写入分区或非分区目标表。 There are two types of table partitioning in BigQuery: BigQuery 中有两种类型的表分区:

Partitioned tables: Tables that are partitioned based on a column.分区表:根据列进行分区的表。 The column type must be a TIMESTAMP or DATE column.列类型必须是 TIMESTAMP 或 DATE 列。 If the destination table is partitioned on a column, you identify the partitioning column when you create the destination table and specify its schema.如果目标表按列分区,则在创建目标表并指定其模式时识别分区列。

This job has been running successfully for days, until last week (Last successful run on 2020-11-04).该作业已成功运行数天,直到上周(上次成功运行时间为 2020-11-04)。 This night (2020-11-10), the job failed with the following error message:今天晚上(2020-11-10),作业失败并显示以下错误消息:

Incompatible table partitioning specification.不兼容的表分区规范。 Destination table exists with partitioning specification interval(type:MONTH,field:createdAt), but transfer target partitioning specification is interval(type:DAY,field:createdAt).目标表存在分区规格间隔(type:MONTH,field:createdAt),但传输目标分区规格为间隔(type:DAY,field:createdAt)。 Please retry after updating either the destination table or the transfer partitioning specification.请在更新目标表或传输分区规范后重试。

I have tried to recreate tables and jobs with such specification and it indeed fails everytime the destination table partitioning type is MONTH.我试图用这样的规范重新创建表和作业,但每次目标表分区类型为 MONTH 时它确实失败了。 However, this still work if the partitioning type is DAY.但是,如果分区类型是 DAY,这仍然有效。 What confuses me the most is the message "the transfer partitioning specification" as such a parameter it doesn't seem to exist in the documentation.最让我困惑的是消息“传输分区规范”作为一个参数,它似乎不存在于文档中。

Is it a recent API breaking change in GCP which has not been documented yet?它是 GCP 中最近的 API 重大更改,尚未记录在案吗?

After a few weeks of investigation and bug fixing on the GCP Team side, the problem has been solved since December 7th, 2020. It was indeed a bug in the Big Query Transfer service.经过 GCP Team 几周的调查和 bug 修复,问题从 2020 年 12 月 7 日开始得到解决。确实是 Big Query Transfer 服务中的一个 bug。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 BigQuery - 使用查询将数据插入分区表 - BigQuery - Insert data into a partitioned table using a query 使用 AWS SCT 进行 Redshift 数据传输的 Bigquery 失败了吗? - Bigquery to Redshift data transfer using AWS SCT failing? 无法删除分区的 bigquery 表 - Can not delete a partitioned bigquery table 使用 BigQuery Spark 连接器保存分区表 - Saving partitioned table with BigQuery Spark connector 计划查询 append 数据到分区 BigQuery 表 - 不兼容的表分区规范 - Scheduled query to append data to partitioned BigQuery table - incompatible table partitioning specification BigQuery 分区表(当天)不分区 - python - BigQuery Partitioned Table (on DAY) it does not partition - python 为什么我使用 Google Ads Transfer(BigQuery 数据传输服务)时 Geostats 表为空? - Why is the Geostats Table empty when I use Google Ads Transfer (BigQuery Data Transfer Service)? 无声地传送到 BigQuery,无法创建 BigQuery 表 - Beam to BigQuery silently failing to create BigQuery table 将查询结果从表写入 BigQuery 中的分区聚簇表 - Write query results from a table to a partitioned - clustered table in BigQuery 每天我都会在 BigQuery 中收到一个新表,我想将这个新表数据连接到主表,数据集架构相同 - Daily I’m receiving a new table in the BigQuery, I want concatenate this new table data to the main table, dataset schema are same
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM