[英]Airflow 2: Job Not Found when transferring data from BigQuery into Cloud Storage
我正在嘗試從 Cloud Composer 1 遷移到 Cloud Composer 2(從 Airflow 1.10.15 到 Airflow 2.2.5),並嘗試使用BigQueryToGCSOperator
將數據從 BigQuery 加載到 GCS
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import BigQueryToGCSOperator
# ...
BigQueryToGCSOperator(
task_id='my-task',
source_project_dataset_table='my-project-name.dataset-name.table-name',
destination_cloud_storage_uris=f'gs://my-bucket/another-path/*.jsonl',
export_format='NEWLINE_DELIMITED_JSON',
compression=None,
location='europe-west2'
)
導致以下錯誤:
[2022-06-07, 11:17:01 UTC] {taskinstance.py:1776} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/bigquery_to_gcs.py", line 141, in execute
job = hook.get_job(job_id=job_id).to_api_repr()
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 439, in inner_wrapper
return func(self, *args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 1492, in get_job
job = client.get_job(job_id=job_id, project=project_id, location=location)
File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2066, in get_job
resource = self._call_api(
File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 782, in _call_api
return call()
File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
return retry_target(
File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
return target()
File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/my-project-name/jobs/airflow_1654592634552749_1896245556bd824c71f31c79d28cdfbe?projection=full&prettyPrint=false: Not found: Job my-project-name:airflow_1654592634552749_1896245556bd824c71f31c79d28cdfbe
任何線索可能是這里的問題以及為什么它在 Airflow 2.2.5 上不起作用(即使等效的BigQueryToCloudStorageOperator
在 Airflow 1.10.15 中適用於 Cloud Composer v1)。
顯然,這似乎是apache-airflow-providers-google
版本v7.0.0
中引入的錯誤。
另請注意,從 BQ 到 GCS 的文件傳輸實際上會成功(即使任務會失敗)。
作為解決方法,您可以恢復到工作版本(如果可能),例如恢復到6.8.0
,或者使用 BQ API 並擺脫BigQueryToGCSOperator
。
例如,
from google.cloud import bigquery
from airflow.operators.python import PythonOperator
def load_bq_to_gcs():
client = bigquery.Client()
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON
destination_uri = f"{<gcs-bucket-destination>}*.jsonl"
dataset_ref = bigquery.DatasetReference(bq_project_name, bq_dataset_name)
table_ref = dataset_ref.table(bq_table_name)
extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=job_config,
location='europe-west2',
)
extract_job.result()
然后創建PythonOperator
的實例:
PythonOperator(
task_id='test_task',
python_callable=load_bq_to_gcs,
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.