[英]BigQuery external table operator use wrong schema path
Here is a snippet from a DAG that I am working on这是我正在研究的 DAG 的片段
create_ext_table = bigquery_operator.BigQueryCreateExternalTableOperator(
task_id='create_ext_table',
bucket='bucket-a',
source_objects='path/*',
schema_object='bucket-b/data/schema.json',
destination_project_dataset_table='sandbox.write_to_BQ',
source_format='CSV',
field_delimiter=';')
create_ext_table
When I run the code, I am getting the following error on Composer 1.10.10+composer :当我运行代码时,我在 Composer 1.10.10+composer 上收到以下错误:
404 GET https://storage.googleapis.com/download/storage/v1/b/bucket-a/o/bucket-b%2Fdata%2Fschema.json?alt=media: (u'Request failed with status code', 404, u'Expected one of', 200, 206)
As seen in the error, airflow concat the bucket param with the schema_objet param ... Is there any workaround with this ?正如错误中所见,气流将桶参数与 schema_objet 参数连接起来......有什么解决方法吗? Because I cannot store the table schema and the table files in the same bucket
因为我无法将表架构和表文件存储在同一个存储桶中
Thanks谢谢
This is expected as you can see in the source code for the operator here that we use the bucket
argument to get the schema_object
, so the operator assumes you have them in the same bucket.正如您在此处操作符的源代码中看到的那样,这是预期的,我们使用
bucket
参数来获取schema_object
,因此操作符假设您将它们放在同一个存储桶中。
As you mentioned you cannot store them there are a few workarounds that you can try, I'll speak to them at a high level:正如您提到的,您无法存储它们,您可以尝试一些解决方法,我将在高层次上与他们交谈:
execute
method in which you retrieve the data from the bucket you care aboutexecute
方法bucket-a
using GoogleCloudStorageToGoogleCloudStorageOperator
.GoogleCloudStorageToGoogleCloudStorageOperator
将架构对象移动到bucket-a
。 This requires handling the schema_object
different from the way the source code handles it.schema_object
。 Namely parsing it for the bucket name and object path then retrieving it.schema_bucket
) and use it in a similar manner.schema_bucket
)并以类似的方式使用它。
GoogleCloudStorageDeleteOperator
as a downstream task after creating the external table so it does not have to be persisted in `bucketGoogleCloudStorageDeleteOperator
作为下游任务删除此对象,因此它不必保留在`bucket 中Final note on the schema_object
argument, it's meant to be the GCS path as it uses the same bucket
, so if you use the already defined operator it should be schema_object='data/schema.json',
关于
schema_object
参数的最后说明,它是 GCS 路径,因为它使用相同的bucket
,所以如果你使用已经定义的运算符,它应该是schema_object='data/schema.json',
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.