简体   繁体   English

如何在另一个任务气流中使用查询(bigquery operator)的结果

[英]How to use the result of a query (bigquery operator) in another task-airflow

I have a project in google composser that aims to submit on a daily basis.我在 google composser 中有一个项目,旨在每天提交。 The code below does that, it works fine.下面的代码就是这样做的,它工作正常。

with models.DAG('reporte_prueba',
    schedule_interval=datetime.timedelta(weeks=4),
    default_args=default_dag_args) as dag:

    make_bq_dataset = bash_operator.BashOperator(
        task_id='make_bq_dataset',
        # Executing 'bq' command requires Google Cloud SDK which comes
        # preinstalled in Cloud Composer.
        bash_command='bq ls {} || bq mk {}'.format(
            bq_dataset_name, bq_dataset_name))
        
    bq_audit_query = bigquery_operator.BigQueryOperator(
        task_id='bq_audit_query',
        sql=query_sql,
        use_legacy_sql=False,
        destination_dataset_table=bq_destination_table_name)

    export_audits_to_gcs = bigquery_to_gcs.BigQueryToCloudStorageOperator(
        task_id='export_audits_to_gcs',
        source_project_dataset_table=bq_destination_table_name,
        destination_cloud_storage_uris=[output_file],
        export_format='CSV')
    
    download_file = GCSToLocalFilesystemOperator(
        task_id="download_file",
        object_name='audits.csv',
        bucket='bucket-reportes',
        filename='/home/airflow/gcs/data/audits.csv',
    )
    email_summary = email_operator.EmailOperator(
        task_id='email_summary',
        to=['aa@bb.cl'],
        subject="""Reporte de Auditorías Diarias 
        Institución: {institution_report} día {date_report}
        """.format(date_report=date,institution_report=institution),
        html_content="""
        Sres.
        <br>
        Adjunto enviamos archivo con Reporte Transacciones Diarias.
        <br>
        """,
        files=['/home/airflow/gcs/data/audits.csv'])

    delete_bq_table = bash_operator.BashOperator(
        task_id='delete_bq_table',
        bash_command='bq rm -f %s' % bq_destination_table_name,
        trigger_rule=trigger_rule.TriggerRule.ALL_DONE)


    (
        make_bq_dataset 
        >> bq_audit_query 
        >> export_audits_to_gcs 
        >> delete_bq_table
    )
    export_audits_to_gcs >> download_file >> email_summary

With this code, I create a table (which is later deleted) with the data that I need to send, then I pass that table to storage as a csv. then I download the.csv to the local airflow directory to send it by mail.使用这段代码,我用我需要发送的数据创建了一个表(稍后被删除),然后将该表作为 csv 传递到存储中。然后我将 .csv 下载到本地 airflow 目录以通过邮件发送.

The question I have is that if I can avoid the part of creating the table and taking it to storage.我的问题是,如果我可以避免创建表并将其存储的部分。 since I don't need it.因为我不需要它。

for example, execute the query with BigqueryOperator and access the result in ariflow, thereby generating the csv locally and then sending it.例如,使用BigqueryOperator 执行查询并在ariflow 中访问结果,从而在本地生成csv 然后发送。

I have the way to generate the CSV but my biggest doubt is how (if it is possible) to access the result of the query or pass the result to another airflow task我有办法生成 CSV 但我最大的疑问是如何(如果可能的话)访问查询结果或将结果传递给另一个 airflow 任务

Though I wouldn't recommend passing results of sql queries across tasks, XComs in airflow are generally used for the communication between tasks.虽然我不建议跨任务传递 sql 查询的结果,但 airflow 中的 XComs 通常用于任务之间的通信。

https://airflow.apache.org/docs/apache-airflow/stable/concepts/xcoms.html https://airflow.apache.org/docs/apache-airflow/stable/concepts/xcoms.html

Also you need to create a custom operator to return query results, as I "believe" BigQueryOperator doesn't return query results.您还需要创建一个自定义运算符来返回查询结果,因为我“相信”BigQueryOperator 不会返回查询结果。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Airflow BigQuery Operator - 将一张表复制到另一张表 - Airflow BigQuery Operator - Copying One Table to Another Table 无法在 Bigquery airflow 运算符中运行地理函数 - Cannot run geography functions in Bigquery airflow operator BigQuery 重现在子选择中使用另一个表的查询 - BigQuery reproducing a query that use another table in a subselect Airflow DAG:如何使用 Python 运算符而不是 BigQuery 运算符将数据插入表中? - Airflow DAG: How to insert data into a table using Python operator, not BigQuery operator? 如何使用 apache airflow schedule google cloud bigquery 存储过程 - how to use apache airflow schedule google cloud bigquery stored procedure Airflow:我将如何编写一个 Python 运算符以从 BigQuery 提取 function 到 GCS function? - Airflow: How would I write a Python operator for an extract function from BigQuery to GCS function? Airflow bigquery_to_gcs 运营商改变 field_delimiter - Airflow bigquery_to_gcs operator changing field_delimiter 无法在 Airflow 2.0 中使用“from airflow.providers.google.cloud.operators.bigquery import BigQueryOperator” - Not being able to use 'from airflow.providers.google.cloud.operators.bigquery import BigQueryOperator' in Airflow 2.0 Airflow - 如何从 BigQuery 表中获取数据并将其用作列表? - Airflow - how can I get data from a BigQuery table and use it as a list? Airflow:如何将 XLM 文件加载到 BigQuery? - Airflow: How to Load an XLM File to BigQuery?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM