[英]How to use the result of a query (bigquery operator) in another task-airflow
I have a project in google composser that aims to submit on a daily basis.我在 google composser 中有一个项目,旨在每天提交。 The code below does that, it works fine.
下面的代码就是这样做的,它工作正常。
with models.DAG('reporte_prueba',
schedule_interval=datetime.timedelta(weeks=4),
default_args=default_dag_args) as dag:
make_bq_dataset = bash_operator.BashOperator(
task_id='make_bq_dataset',
# Executing 'bq' command requires Google Cloud SDK which comes
# preinstalled in Cloud Composer.
bash_command='bq ls {} || bq mk {}'.format(
bq_dataset_name, bq_dataset_name))
bq_audit_query = bigquery_operator.BigQueryOperator(
task_id='bq_audit_query',
sql=query_sql,
use_legacy_sql=False,
destination_dataset_table=bq_destination_table_name)
export_audits_to_gcs = bigquery_to_gcs.BigQueryToCloudStorageOperator(
task_id='export_audits_to_gcs',
source_project_dataset_table=bq_destination_table_name,
destination_cloud_storage_uris=[output_file],
export_format='CSV')
download_file = GCSToLocalFilesystemOperator(
task_id="download_file",
object_name='audits.csv',
bucket='bucket-reportes',
filename='/home/airflow/gcs/data/audits.csv',
)
email_summary = email_operator.EmailOperator(
task_id='email_summary',
to=['aa@bb.cl'],
subject="""Reporte de Auditorías Diarias
Institución: {institution_report} día {date_report}
""".format(date_report=date,institution_report=institution),
html_content="""
Sres.
<br>
Adjunto enviamos archivo con Reporte Transacciones Diarias.
<br>
""",
files=['/home/airflow/gcs/data/audits.csv'])
delete_bq_table = bash_operator.BashOperator(
task_id='delete_bq_table',
bash_command='bq rm -f %s' % bq_destination_table_name,
trigger_rule=trigger_rule.TriggerRule.ALL_DONE)
(
make_bq_dataset
>> bq_audit_query
>> export_audits_to_gcs
>> delete_bq_table
)
export_audits_to_gcs >> download_file >> email_summary
With this code, I create a table (which is later deleted) with the data that I need to send, then I pass that table to storage as a csv. then I download the.csv to the local airflow directory to send it by mail.使用这段代码,我用我需要发送的数据创建了一个表(稍后被删除),然后将该表作为 csv 传递到存储中。然后我将 .csv 下载到本地 airflow 目录以通过邮件发送.
The question I have is that if I can avoid the part of creating the table and taking it to storage.我的问题是,如果我可以避免创建表并将其存储的部分。 since I don't need it.
因为我不需要它。
for example, execute the query with BigqueryOperator and access the result in ariflow, thereby generating the csv locally and then sending it.例如,使用BigqueryOperator 执行查询并在ariflow 中访问结果,从而在本地生成csv 然后发送。
I have the way to generate the CSV but my biggest doubt is how (if it is possible) to access the result of the query or pass the result to another airflow task我有办法生成 CSV 但我最大的疑问是如何(如果可能的话)访问查询结果或将结果传递给另一个 airflow 任务
Though I wouldn't recommend passing results of sql queries across tasks, XComs in airflow are generally used for the communication between tasks.虽然我不建议跨任务传递 sql 查询的结果,但 airflow 中的 XComs 通常用于任务之间的通信。
https://airflow.apache.org/docs/apache-airflow/stable/concepts/xcoms.html https://airflow.apache.org/docs/apache-airflow/stable/concepts/xcoms.html
Also you need to create a custom operator to return query results, as I "believe" BigQueryOperator doesn't return query results.您还需要创建一个自定义运算符来返回查询结果,因为我“相信”BigQueryOperator 不会返回查询结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.