简体   繁体   中英

Airflow ExternalTaskSensor Stuck

I am trying to get the Airflow ExternalTaskSensor to work but so far have not been able to get it to complete, it always seems to get stuck running and never finishes so the DAG can move onto the next task.

Here is the code I am using to test:


DEFAULT_ARGS = {
    'owner': 'NAME',
    'depends_on_past': False,
    'start_date': datetime(2019, 9, 9),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False
}

external_watch_dag = DAG(
    'DAG-External_watcher-Test',
    default_args=DEFAULT_ARGS,
    dagrun_timeout=timedelta(hours=1),
    schedule_interval=None
)

start_op = DummyOperator(
    task_id='start_op',
    dag=external_watch_dag
)


trigger_external = TriggerDagRunOperator(
    task_id='trigger_external',
    trigger_dag_id='DAG-Dummy',
    dag=external_watch_dag
)

external_watch_op = ExternalTaskSensor(
    task_id='external_watch_op',
    external_dag_id='DAG-Dummy',
    external_task_id='dummy_task',
    check_existence=True,
    execution_delta=timedelta(minutes=-1),
    # execution_date_fn=datetime(2019, 9, 25),
    execution_timeout=timedelta(minutes=30),
    dag=external_watch_dag
)

end_op = DummyOperator(
    task_id='end_op',
    dag=external_watch_dag
)

start_op >> trigger_external >> external_watch_op >> end_op
# start_op >> [external_watch_op, trigger_external]
# external_watch_op >> end_op


# Below is the setup for the dummy DAG that is called above by the Trigger and watched by the TaskSensor
dummy_dag = DAG(
    'DAG-Dummy',
    default_args=DEFAULT_ARGS,
    dagrun_timeout=timedelta(hours=1),
    schedule_interval=None
)

dummy_task = BashOperator(
    task_id='dummy_task',
    bash_command='sleep 10',
    dag=dummy_dag
)

I have tried tweaking this code a number of ways but have not gotten any success with the ExternalTaskSensor.

Does anyone know how to solve this problem and get the ExternalTaskSensor to work properly? I have also read that issues can arise through scheduling intervals when using the ExternalTaskSensor, is it possible that part of the issue is that the DAGs both have schedule_interval=None ?

I had gotten this to work with both of the DAGs set to the exact same schedule_interval , but that will not work in production. The goal is to have the main DAG, external-watch-dag to be on a regular schedule and trigger that DAG-Dummy during its run, with the DAG-Dummy itself having schedule_interval=None .

Any help is greatly appreciated.

By default the ExternalTaskSensor will monitor the external_dag_id with the same execution date that the sensor DAG. With execution_delta you can set a time delta between the sensor dag and the external dag so it can look for the correct execution_date to monitor. This works great when both dags are run in a schedule because you know exactly this timedelta.

The problem : when a dag is triggered manually or by another dag, you cannot known for sure the the exact execution date of any of these two dags.

The solution : because you are using the TriggerDagRunOperator , you can set the execution_date parameter. This will make sure that the execution date from your dag and the external dag is the same. From the docs :

execution_date (str or datetime.datetime) – Execution date for the dag (templated)

So your code will look like this:

trigger_external = TriggerDagRunOperator(
    task_id='trigger_external',
    trigger_dag_id='DAG-Dummy',
    dag=external_watch_dag,
    execution_date="{{ execution_date }}",  # Use the template to get the current execution date
)
external_watch_op = ExternalTaskSensor(
    task_id='external_watch_op',
    external_dag_id='DAG-Dummy',
    external_task_id='dummy_task',
    check_existence=True,
    execution_timeout=timedelta(minutes=30),
    dag=external_watch_dag
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM