简体   繁体   中英

Bigquery data transfer failing if target table is not daily-partitioned

I have a Bigquery data transfer job setup to a destination table which is partitioned by month. The table has been created with the following command:

bq mk --table \                                                                              
  --schema schema.json \
  --time_partitioning_field createdAt \
  --time_partitioning_type MONTH \
  myproject:mydataset.MyTable

The datatransfer job has been created with the Python BQDTS client, like this:

parent = f"projects/myproject/locations/{location}"
baseparams = {
    "file_format": "CSV",
    "ignore_unknown_values": True,
    "field_delimiter": ",",
    "skip_leading_rows": "0",
    "allow_jagged_rows": True,
}
params = Struct()
params_content = baseparams.copy()
params_content[
    "data_path_template"
] = f"gs://mybucket/**/*.csv"
params_content["destination_table_name_template"] = "MyTable"

params.update(params_content)
tc_dict = {
    "display_name": target_display_name,
    "destination_dataset_id": "mydataset",
    "data_source_id": "google_cloud_storage",
    "schedule": "every 24 hours",
    "params": params,
}
tc = bigquery_datatransfer_v1.types.TransferConfig(**tc_dict)
response = client.create_transfer_config(
    request={"parent": parent, "transfer_config": tc}
)

As you can see, there is no partitioning specified in the job definition, it is only specified in the database table, as should be according to the documentation :

Partitioning options Cloud Storage and Amazon S3 transfers can write to partitioned or non-partitioned destination tables. There are two types of table partitioning in BigQuery:

Partitioned tables: Tables that are partitioned based on a column. The column type must be a TIMESTAMP or DATE column. If the destination table is partitioned on a column, you identify the partitioning column when you create the destination table and specify its schema.

This job has been running successfully for days, until last week (Last successful run on 2020-11-04). This night (2020-11-10), the job failed with the following error message:

Incompatible table partitioning specification. Destination table exists with partitioning specification interval(type:MONTH,field:createdAt), but transfer target partitioning specification is interval(type:DAY,field:createdAt). Please retry after updating either the destination table or the transfer partitioning specification.

I have tried to recreate tables and jobs with such specification and it indeed fails everytime the destination table partitioning type is MONTH. However, this still work if the partitioning type is DAY. What confuses me the most is the message "the transfer partitioning specification" as such a parameter it doesn't seem to exist in the documentation.

Is it a recent API breaking change in GCP which has not been documented yet?

After a few weeks of investigation and bug fixing on the GCP Team side, the problem has been solved since December 7th, 2020. It was indeed a bug in the Big Query Transfer service.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM