简体   繁体   中英

Airflow DAGS Orchestration

Can anyone help me out? I have three DAGs (say, DAG1, DAG2 and DAG3). I have a monthly scheduler for DAG1. DAG2 and DAG3 must not be run directly (no scheduler for these) and must be run only when DAG1 is completed successfully. That is, once DAG1 is complete, DAG2 and DAG3 will need to start in parallel. Could anyone please let me know what is the best mechanism to do this? I came across TriggerDAGRun and ExternalTaskSensor options. If someone can specify the pros and cons of each and which one is the best, that would be great. I see few questions around these. However, trying to find the answer for the latest stable Airflow version. Could anyone shed some light on this? Thanks in advance!

ExternalTaskSensor is not relevant for your use case as none of the DAGs you mention needs to wait for another DAG.

You need to set TriggerDagRunOperator at the code of DAG1 that will trigger the DAG runs for DAG2, DAG3.

A skeleton of the solution would be:

dag2 = DAG(dag_id="DAG2", schedule_inteval=None)
dag3 = DAG(dag_id="DAG3", schedule_inteval=None)

with DAG(dag_id="DAG1", schedule_inteval="@monthly") as dag1:

    op_first = DummyOperator(task_id="first") #Replace with operators of your DAG
    op_trig2 = TriggerDagRunOperator(task_id="trigger_dag2", trigger_dag_id="DAG2")
    op_trig3 = TriggerDagRunOperator(task_id="trigger_dag3", trigger_dag_id="DAG3")

    op_first >> [op_trig2, op_trig3]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM