[英]Airflow - how can I get data from a BigQuery table and use it as a list?
I'm trying to get a column, then use values to create file names.我正在尝试获取一列,然后使用值来创建文件名。
I've tried the following, which should create a csv with the name of the first value in the column specified.我尝试了以下方法,它应该创建一个 csv,其名称为指定列中的第一个值。 It says the list is empty though when I try to use it
它说列表是空的,但当我尝试使用它时
bq_data = []
get_data = BigQueryGetDataOperator(
task_id='get_data_from_bq',
dataset_id='SK22',
table_id='current_times',
max_results='100',
selected_fields='current_timestamps',
)
def process_data_from_bq(**kwargs):
ti = kwargs['ti']
global bq_data
bq_data = ti.xcom_pull(task_ids='get_data_from_bq')
process_data = PythonOperator(
task_id='process_data_from_bq',
python_callable=process_data_from_bq,
provide_context=True)
run_export = BigQueryToCloudStorageOperator(
task_id=f"save_data_on_storage{str(bq_data[0])}",
source_project_dataset_table="a-data-set",
destination_cloud_storage_uris=[f"gs://europe-west1-airflow-bucket/data/test{bq_data[0]}.csv"],
export_format="CSV",
field_delimiter=",",
print_header=False,
dag=dag,
)
get_data >> process_data >> run_export
I think no need to use a PythonOperator
between BigQueryGetDataOperator
and BigQueryToCloudStorageOperator
, you can directly use xcom pull
in BigQueryToCloudStorageOperator
:我认为不需要在
BigQueryGetDataOperator
和BigQueryToCloudStorageOperator
之间使用PythonOperator
,您可以直接在BigQueryToCloudStorageOperator
中使用xcom pull
:
get_data = BigQueryGetDataOperator(
task_id='get_data_from_bq',
dataset_id='SK22',
table_id='current_times',
max_results='100',
selected_fields='current_timestamps',
)
run_export = BigQueryToCloudStorageOperator(
task_id="save_data_on_storage",
source_project_dataset_table="a-data-set",
destination_cloud_storage_uris=[f"gs://europe-west1-airflow-bucket/data/test" + "{{ ti.xcom_pull(task_ids='get_data_from_bq')[0] }}" + ".csv"],
export_format="CSV",
field_delimiter=",",
print_header=False,
dag=dag,
)
get_data >> run_export
destination_cloud_storage_uris
is a templated param and you can pass Jinja
template syntax inside. destination_cloud_storage_uris
是一个模板化参数,您可以在其中传递Jinja
模板语法。
I don't tested the syntax but it should work.我没有测试语法,但它应该可以工作。
I also don't recommend you using global variable like bq_data
to pass data between operators, because it doesn't work, you need to find a way to use xcom
directly in the operator ( Jinja
template or access to the current Context
of the operator).我也不建议你使用像
bq_data
这样的全局变量在operator之间传递数据,因为它不起作用,你需要找到一种方法直接在operator中使用xcom
( Jinja
模板或访问operator的当前Context
).
I also noticed that you are not using the latest Airflow
operators:我还注意到您没有使用最新的
Airflow
运算符:
If you want using all the list provided by BigQueryGetDataOperator operator and calculate a list of destination URIs from it, I propose you another solution :如果您想使用 BigQueryGetDataOperator 运算符提供的所有列表并从中计算目标 URI 列表,我建议您使用另一种解决方案:
from typing import List, Dict
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import BigQueryToGCSOperator
class CustomBigQueryToGCSOperator(BigQueryToGCSOperator):
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
def execute(self, context):
task_instance = context['task_instance']
data_from_bq: List[Dict] = task_instance.xcom_pull('get_data_from_bq')
destination_cloud_storage_uris: List[str] = list(map(self.to_destination_cloud_storage_uris, data_from_bq))
self.destination_cloud_storage_uris = destination_cloud_storage_uris
super(CustomBigQueryToGCSOperator, self).execute(context)
def to_destination_cloud_storage_uris(self, data_from_bq: Dict) -> str:
return f"gs://europe-west1-airflow-bucket/data/test{data_from_bq['your_field']}.csv"
Some explanations:一些解释:
BigQueryToGCSOperator
BigQueryToGCSOperator
的自定义运算符execute
method, I have access to the current context of the operatorexecute
方法中,我可以访问操作员的当前上下文BQ
provided by the BigQueryGetDataOperator
.BigQueryGetDataOperator
提供的BQ
中检索列表。 I assume it's a list of Dict but you have to confirm thisGCS
URIs from this list of DictGCS
URI 列表GCS
URIs to the corresponding field in the operatorGCS
URI 分配给运算符中的相应字段The pros of this solution, you have more flexibility to apply logic based on xcom value.此解决方案的优点是,您可以更灵活地应用基于 xcom 值的逻辑。
The cons is it's little verbose.缺点是它有点冗长。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.