简体   繁体   English

当外部任务失败时,气流外部任务传感器不会失败

[英]Airflow ExternalTaskSensor don't fail when External Task fails

I was trying to use the ExternalTaskSensor in Airflow 1.10.11 to manage the coordinate some dags.我试图在 Airflow 1.10.11 中使用ExternalTaskSensor来管理一些 dag 的坐标。 I have develop this code to test the functionality:我开发了这段代码来测试功能:

import time
from datetime import datetime, timedelta
from pprint import pprint

from airflow import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.sensors.external_task_sensor import ExternalTaskSensor
from airflow.utils.state import State

sensors_dag = DAG(
    "test_launch_sensors",
    schedule_interval=None,
    start_date=datetime(2020, 2, 14, 0, 0, 0),
    dagrun_timeout=timedelta(minutes=150),
    tags=["DEMO"],
)

dummy_dag = DAG(
    "test_dummy_dag",
    schedule_interval=None,
    start_date=datetime(2020, 2, 14, 0, 0, 0),
    dagrun_timeout=timedelta(minutes=150),
    tags=["DEMO"],
)


def print_context(ds, **context):
    pprint(context['conf'])


with dummy_dag:
    starts = DummyOperator(task_id="starts", dag=dummy_dag)
    empty = PythonOperator(
        task_id="empty",
        provide_context=True,
        python_callable=print_context,
        dag=dummy_dag,
    )
    ends = DummyOperator(task_id="ends", dag=dummy_dag)

    starts >> empty >> ends

with sensors_dag:
    trigger = TriggerDagRunOperator(
        task_id=f"trigger_{dummy_dag.dag_id}",
        trigger_dag_id=dummy_dag.dag_id,
        conf={"key": "value"},
        execution_date="{{ execution_date }}",
    )
    sensor = ExternalTaskSensor(
        task_id="wait_for_dag",
        external_dag_id=dummy_dag.dag_id,
        external_task_id="ends",
        failed_states=["failed", "upstream_failed"],
        poke_interval=5,
        timeout=120,
    )
    trigger >> sensor

The idea is that one dag triggers another one with a TriggerDagRunOperator .这个想法是一个 dag 用TriggerDagRunOperator触发另一个。 This sets the execution_date to the same value in both dags.这会将两个 dag 中的execution_date设置为相同的值。 This works perfectly when the state of the dummy_dag last task, ends , is success .这完美的作品时的状态dummy_dag最后一个任务, ends ,是success

However, if I force the intermediate task to fail like so:但是,如果我强制中间任务像这样失败:

def print_context(ds, **context):
    pprint(context['conf'])
    raise Exception('ouch')

The Sensor doesn't detect the failed or the upstream_failed states, and it keeps running until it times out. Sensor 不会检测到failedupstream_failed状态,它会一直运行直到超时。 I was using the failed_states parameter to indicate which states need to be consider as failure, but it seems that is not working.我使用failed_states参数来指示需要将哪些状态视为失败,但似乎不起作用。

Am I doing something wrong?难道我做错了什么?

failed_states was added in Airflow 2.0; failed_states是在 Airflow 2.0 中添加的; you'd set it to ["failed"] to configure the sensor to fail the current DAG run if the monitored DAG run failed.如果受监控的 DAG 运行失败,您可以将其设置为["failed"]以将传感器配置为使当前 DAG 运行失败。 If given a task ID, it'll monitor the task state, otherwise it monitors DAG run state.如果给定任务 ID,它将监视任务状态,否则监视 DAG 运行状态。

In Airflow 1.x, unfortunately, the ExternalTaskSensor operation only compares DAG run or task state against allowed_states ;不幸的是,在 Airflow 1.x 中, ExternalTaskSensor操作仅将 DAG 运行或任务状态与allowed_states进行比较; as soon as the monitored DAG run or task reaches one of the allowed states, the sensor stops, and is then always marked as successful.一旦受监控的 DAG 运行或任务达到允许的状态之一,传感器就会停止,然后始终标记为成功。 By default, the sensor only looks for the SUCCESS state, so without a timeout it'll just keep on poking forever if the monitored DAG run has failed.默认情况下,传感器只查找SUCCESS状态,因此如果受监控的 DAG 运行失败,它会在没有超时的情况下永远继续探查。 If you put failed in the allowed_states list, it will still only ever mark itself as successful.如果您将failed放入allowed_states列表中,它仍然只会将自己标记为成功。

While you could use a timeout, like you I needed the sensor to fail it's own DAG run if the external DAG run failed, as if the dependencies for the next task have not been met.虽然你可以使用超时,像你我所需要的传感器失效它自己的运行DAG如果外部DAG运行失败,仿佛下一个任务的依赖性还没有得到满足。 This requires you write your own sensor, unfortunately.不幸的是,这需要您编写自己的传感器。

Here is my implementation;这是我的实现; it is a simplified version of the ExternalTaskSensor() class, adapted to my simpler needs (no need to check for a specific task id or for anything other than the same execution date):它是ExternalTaskSensor()类的简化版本,适合我更简单的需求(无需检查特定任务 ID 或除相同执行日期之外的任何其他内容):

from airflow.exceptions import AirflowFailException
from airflow.models import DagRun
from airflow.sensors.base_sensor_operator import BaseSensorOperator
from airflow.utils.db import provide_session
from airflow.utils.decorators import apply_defaults
from airflow.utils.state import State

class ExternalDagrunSensor(BaseSensorOperator):
    """
    Waits for a different DAG to complete; if the dagrun has failed, this
    task fails itself as well.

    :param external_dag_id: The dag_id that contains the task you want to
        wait for
    :type external_dag_id: str
    """

    template_fields = ["external_dag_id"]
    ui_color = "#19647e"

    @apply_defaults
    def __init__(self, external_dag_id, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.external_dag_id = external_dag_id

    @provide_session
    def poke(self, context, session=None):
        dag_id, execution_date = self.external_dag_id, context["execution_date"]
        self.log.info("Poking for %s on %s ... ", dag_id, execution_date)

        state = (
            session.query(DagRun.state)
            .filter(
                DagRun.dag_id == dag_id,
                DagRun.execution_date == execution_date,
                DagRun.state.in_((State.SUCCESS, State.FAILED)),
            )
            .scalar()
        )
        if state == State.FAILED:
            raise AirflowFailException(
                f"The external DAG run {dag_id} {execution_date} has failed"
            )
        return state is not None

The base sensor implementation will call the poke() method repeatedly until it returns True (or the optional timeout was reached), and by raising AirflowFailException the task state is set to failed immediately, no retrying.基本传感器实现将重复调用poke()方法,直到它返回True (或达到可选超时),并通过引发AirflowFailException将任务状态设置为立即失败,不再重试。 It is then up to the downstream task configuration if they will be scheduled to run.如果它们将被安排运行,则由下游任务配置决定。

ExternalTaskSensor just pokes till some expected state is reached, it's state is not intended to be mapped with the external task state. ExternalTask​​Sensor 只是戳直到达到某个预期状态,它的状态不打算与外部任务状态映射。

it defaults to [State.SUCCESS] that's why if success you don't have any problem.它默认为 [State.SUCCESS] 这就是为什么如果成功你没有任何问题。 Adding allowed_states=[State.SUCCESS, State.failed, State.upstream_failed] To your code will at least ensure the external task has finished.将 allowed_states=[State.SUCCESS, State.failed, State.upstream_failed] 添加到您的代码中至少可以确保外部任务已完成。

Additionally you can set a timeout to make it fail, if soft_fail = False此外,您可以设置超时以使其失败,如果 soft_fail = False

If you want for the sensor to FAIL if the external task failed you'll need to write your own implementation of such sensor.如果您希望传感器在外部任务失败时失败,您需要编写自己的此类传感器实现。

For example here's how I'm checking for Last Dagrun of a Dag to match certain state例如,这里是我如何检查 Dag 的 Last Dagrun 以匹配特定状态

@provide_session
def poke(self, context, session=None):
    """
    Checks if latest dag_run State is in expected state else keeps polling...
    :param context:
    :param session:
    :return:
    """
    DR = DagRun
    self.log.info(
        f"Poking for {self.external_dag_id}, {self.allowed_states} -> {self.state_condition} ... "
    )
    # If state is expected to match
    if self.state_condition:
        query = session.query(DR).filter(DR.dag_id == self.external_dag_id,
                                                   DR.state.notin_(self.allowed_states))
    # If state is not expected to match
    else:
        query = session.query(DR).filter(DR.dag_id == self.external_dag_id,
                                                   DR.state.in_(self.allowed_states))
    # Filter by last_dagrun, could be max(execution_date) also but avoiding such aggregation
    # by sorting dag_run chronologically in descendent order
    query = query.order_by(DR.execution_date.desc()).first()
    session.commit()
    return not query

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM