[英]The BigQueryInsertJobOperator in Airflow does not create a table
[英]Variable passing throwing error in BigQueryInsertJobOperator in Airflow
我在 Airflow 到 select 中編寫了 BigQueryInsertJobOperator 並將數據插入到 Big Query 表中。 但我面臨變量傳遞的問題。 執行 Airflow DAG 時出現錯誤。
文件“/home/airflow/.local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py”,第 911 行,to_api_repr 配置 = self._configuration.to_api_repr() 文件“/ home/airflow/.local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py",第 683 行,在 to_api_repr query_parameters = resource["query"].get("queryParameters") AttributeError: 'str' object 沒有屬性 'get'
這是我的操作員代碼:
dag = DAG(
'bq_to_sql_operator',
default_args=default_args,
schedule_interval="@daily",
template_searchpath="/opt/airflow/dags/scripts",
user_defined_macros={"BQ_PROJECT": BQ_PROJECT, "BQ_EDW_DATASET": BQ_EDW_DATASET, "BQ_STAGING_DATASET": BQ_STAGING_DATASET},
catchup=False
)
t1 = BigQueryInsertJobOperator(
task_id='bq_write_to_umc_cg_service_agg_stg',
configuration={
"query": "{% include 'umc_cg_service_agg_stg.sql' %}",
"useLegacySql":False,
"allow_large_results":True,
"writeDisposition": "WRITE_TRUNCATE",
"destinationTable": {
'projectId': BQ_PROJECT,
'datasetId': BQ_STAGING_DATASET,
'tableId': UMC_CG_SERVICE_AGG_STG_TABLE_NAME
}
},
params={'BQ_PROJECT': BQ_PROJECT, 'BQ_EDW_DATASET': BQ_EDW_DATASET, 'BQ_STAGING_DATASET': BQ_STAGING_DATASET },
gcp_conn_id=BQ_CONN_ID,
location=BQ_LOCATION,
dag=dag
)
我的 SQL 文件如下所示:
select
faccs2.employer_key employer_key,
faccs2.service_name service_name,
gender,
approximate_age_band,
state,
relationship_map_name,
account_attribute1_name,
account_attribute1_value,
account_attribute2_name,
account_attribute2_value,
account_attribute3_name,
account_attribute3_value,
account_attribute4_name,
account_attribute4_value,
account_attribute5_name,
account_attribute5_value,
count(distinct faccs2.sf_service_id) total_service_count
from `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.fact_account_cg_case_survey` faccs
inner join `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.fact_account_cg_case_service` faccs2 on faccs.sf_case_id = faccs2.sf_case_id
inner join `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.dim_account` da on faccs2.account_key = da.account_key
left join `{{params.BQ_PROJECT}}.{{params.BQ_STAGING_DATASET}}.stg_account_selected_attr_tmp2` attr on faccs.account_key = attr.account_key
where not da.is_test_account_flag
and attr.gender is not null
and coalesce(faccs.case_status,'abc') <> 'Closed as Duplicate'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16;
有人可以幫我解決這個問題。
我認為查詢配置應該在一個名為query
的嵌套文檔中:
t1 = BigQueryInsertJobOperator(
task_id='bq_write_to_umc_cg_service_agg_stg',
configuration={
"query": {
"query": "{% include 'umc_cg_service_agg_stg.sql' %}",
"useLegacySql":False,
"allow_large_results":True,
"writeDisposition": "WRITE_TRUNCATE",
"destinationTable": {
'projectId': BQ_PROJECT,
'datasetId': BQ_STAGING_DATASET,
'tableId': UMC_CG_SERVICE_AGG_STG_TABLE_NAME
}
}
},
params={'BQ_PROJECT': BQ_PROJECT, 'BQ_EDW_DATASET': BQ_EDW_DATASET, 'BQ_STAGING_DATASET': BQ_STAGING_DATASET },
gcp_conn_id=BQ_CONN_ID,
location=BQ_LOCATION,
dag=dag
)
使用您提供的配置字典,內部方法嘗試訪問應該在字典configuration["query"]
中的queryParameters
,但它找到的是 str 而不是 dict。
考慮下面我在工作中使用的腳本。
target_date = '{{ ds_nodash }}'
...
# DAG task
t1= bq.BigQueryInsertJobOperator(
task_id = 'sample_task,
configuration = {
"query": {
"query": f"{{% include 'your_query_file.sql' %}}",
"useLegacySql": False,
"queryParameters": [
{ "name": "target_date",
"parameterType": { "type": "STRING" },
"parameterValue": { "value": f"{target_date}" }
}
],
"parameterMode": "NAMED"
},
},
location = 'asia-northeast3',
)
-- in your_query_file.sql, @target_date value is passed as a named parameter.
DECLARE target_date DATE DEFAULT SAFE.PARSE_DATE('%Y%m%d', @target_date);
SELECT ... FROM ... WHERE partitioned_at = target_date;
您可以參考以下鏈接中的配置JSON 字段規范。
參數模式字符串
僅標准 SQL。 設置為 POSITIONAL 以使用位置 (?) 查詢參數或設置為NAMED以在此查詢中使用命名 (@myparam) 查詢參數。
查詢參數[] object (查詢參數)
標准 SQL 查詢的 jobs.query 參數。
queryParameters是一個QueryParameter數組,具有以下 JSON 格式。
{
"name": string,
"parameterType": {
object (QueryParameterType)
},
"parameterValue": {
object (QueryParameterValue)
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.