简体   繁体   English

如何在 Big Query 表中显示 Airflow DAG 状态

[英]How to display Airflow DAG status in Big Query tables

I want to show the DAG (airflow) final status (success/Failure) to a table in BQ.我想向 BQ 中的表显示 DAG(气流)最终状态(成功/失败)。 Like that table can contains: Date-Time,DAG-Name,Status etc columns and it will get populated according to the final status of the DAG.就像那个表可以包含:日期时间、DAG 名称、状态等列,它将根据 DAG 的最终状态进行填充。

Please help;请帮忙; how can this be achieved?如何做到这一点?

There's no native out-of-the-box method to achieve this in Airflow. However, you could implement a function yourself which writes data to BigQuery and run it via a DAG's on_success_callback and on_failure_callback methods. Airflow 中没有现成的本机方法来实现此目的。但是,您可以自己实现一个 function,它将数据写入 BigQuery 并通过 DAG 的on_success_callbackon_failure_callback方法运行它。

Note: BigQuery is not a transactional database and has limits on the number of inserts per day.注意:BigQuery 不是事务型数据库,并且对每天的插入次数有限制。 For a large number of DAG runs, you might want to think of writing results in batches to BigQuery.对于大量 DAG 运行,您可能会考虑将结果批量写入 BigQuery。

If you need the data in real-time, I would go with somethign along the lines of the approach @Bas has suggested, maybe with firestore or Cloud SQL. However note his comments on the inserts per day if you go with BigQuery.如果您需要实时数据,我会 go 按照@Bas 建议的方法进行一些操作,可能使用 firestore 或 Cloud SQL。但是,如果您使用 BigQuery go,请注意他对每天插入的评论。

If you can wait on the results on a daily basis you can do a log sink to BigQuery as described here: https://cloud.google.com/bigquery/docs/reference/auditlogs#stackdriver_logging_exports如果您可以每天等待结果,您可以按照此处所述将日志汇到 BigQuery: https://cloud.google.com/bigquery/docs/reference/auditlogs#stackdriver_logging_exports

In the filter criteria you can either bring in all of the Airflow logs or just the ones from the worker/scheduler.在过滤条件中,您可以引入所有 Airflow 日志或仅引入来自工作程序/调度程序的日志。

Ex criteria:防爆标准:

resource.type="cloud_composer_environment"
logName="projects/{YOUR-PROJECT}/logs/airflow-worker"

In the log textPayload you will see something like:在日志 textPayload 中,您将看到类似以下内容:

Marking task as SUCCESS. dag_id=thing, task_id=stuff, executiondate=20220307T111111, start_date=20220307T114858, end_date=20220307T114859

You can then parse for what you need in BigQuery然后,您可以在 BigQuery 中解析您需要的内容

To complement the answer of user Bas Harenslak .补充用户Bas Harenslak的答案。 There are these options also that you can explore:您还可以探索以下选项:

  • You can make use of TriggerDagRunOperator .您可以使用TriggerDagRunOperator By using it you can have one dag (a recap-dag ) which will be referenced by your DAGs to populate the record into your destination dataset.通过使用它,您可以拥有一个 dag(一个recap-dag ),您的 DAG 将引用它以将记录填充到目标数据集中。
trigger_recap_dag = TriggerDagRunOperator(
      task_id="trigger_recap_dag",
      trigger_dag_id="recap-dag",
      wait_for_completion=False,
      allowed_states=['success']
      conf='{"Time": datetime.now() ,"DAG": "recap-dag","Status":"success"}'
  )

ingestion >> transformation >> save >> send_notification >> trigger_recap_dag 
  • If you see fit, This recap-dag can also be independent and only run every hour/day/week of your election and checks your DAGs status.如果你认为合适,这个recap-dag也可以是独立的,并且只在你选举的每个小时/天/周运行,并检查你的 DAG 状态。
with DAG(
  'recap-dag',
  schedule_interval='@daily',
  start_date=datetime(2021, 1, 1),
  catchup=False, 
) as dag:

  ...
  # Airflow >= 2.0.0
  # Inside a python Operator
  def GetRunningDagsInfo():
     dag_runs = DagRun.find(
       dag_id=your_dag_id,
       execution_start_date=your_start_date
       execution_end_date=your_end_date
     )
    
  ...
  • You can make use of prior options and come with a solution like this:您可以使用先前的选项并提供如下解决方案:

After you dag (or dags) complete, it will fire the trigger dag.在你 dag(或 dags)完成后,它会触发触发 dag。 this recap-dag will saves your dag records into a custom table or file and then your independent DAG runs and retrieves the datasets that have been created so far and push the data into your BigQuery Table.recap-dag会将您的 dag 记录保存到自定义表或文件中,然后您的独立 DAG 运行并检索到目前为止已创建的数据集,并将数据推送到您的 BigQuery 表中。

  • Another option is by looking into your Airflow Database to retrieve running information.另一种选择是查看您的 Airflow 数据库以检索运行信息。 Know as Data Profiling .称为数据分析 It has been deprecated in latest versions due to security concerns.出于安全考虑,它在最新版本中已被弃用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM