简体   繁体   English

BranchOperator 被跳过 airflow

[英]BranchOperator is getting skipped airflow

I have this flow:-我有这个流程: -

execution_date_hour = "{{ execution_date.strftime('%H') }}"

default_args = {
    'owner': 'hourly-airflow',
    'depends_on_past': False,
    'catch_up': False,
    'start_date': days_ago(1),
    'email': failure_email_list,
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 3,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG('hourly_pipeline_dag',
          default_args=default_args,
          tags=['hourly'],
          schedule_interval='@hourly',
          catchup=False)

taskA = PostgresOperator(dag=dag,
                         task_id='taskA', 
                         postgres_conn_id='database_connection',
                         sql='sql/hourly_entry.sql')

taskb = DummyOperator(
    dag=dag,
    task_id="taskb"
)

taske = DummyOperator(
    dag=dag,
    task_id="taske"
)

taskc = DummyOperator(
    dag=dag,
    task_id="taskc"
)

taskd = DummyOperator(
    dag=dag,
    task_id="taskd"
)

branch_op = BranchPythonOperator(
    task_id='branch_op',
    python_callable=lambda
        **kwargs: 'feed_sensor_a' if execution_date_hour == '5' else 'feed_sensor_b',
    dag=dag)

feed_sensor_a = SqlSensor(dag=dag,
                          task_id='feed_sensor_a',
                          conn_id='database_connection',
                          sql='sql/sensor_hourly.sql',
                          poke_interval=30,
                          trigger_rule=TriggerRule.ONE_SUCCESS,
                          timeout=3600)

feed_sensor_b = SqlSensor(dag=dag,
                          task_id='feed_sensor_b',
                          conn_id='database_connection',
                          sql='sql/sensor.sql',
                          poke_interval=30,
                          trigger_rule=TriggerRule.ONE_SUCCESS,
                          timeout=3600)

taskA >> [taskb,taskc]
taskb >> taskd
taskc >> taske
[taskd,taske] >> branch_op
branch_op >> [feed_sensor_a,feed_sensor_b] 

Pipeline runs till taskd and taske, branch_op is skipped.管道一直运行到 taskd 和 taske,branch_op 被跳过。 Please help, I am stuck on this for soo long.请帮忙,我被困在这个问题上太久了。 Till taske and taskd it runs fine, branch_op is highlighted in red ie skipped, don't know what happens here.直到 taske 和 taskd 运行良好,branch_op 以红色突出显示,即跳过,不知道这里发生了什么。 (All these tasks are dummy tasks, in actual they are HttpOperator and Postgres op). (所有这些任务都是虚拟任务,实际上它们是 HttpOperator 和 Postgres op)。 Thanks in advance, let me know if any other info is required.提前感谢,如果需要任何其他信息,请告诉我。

Running your code I don't see the branch_op task failing or being skipped.运行您的代码我没有看到branch_op任务失败或被跳过。 However, I don't think your BranchPythonOperator task will work as you'd like it to.但是,我认为您的BranchPythonOperator任务不会像您希望的那样工作。 There are no inputs being passed into the lambda function and python_callable is not a templated field for the operator (ie the logic is evaluating to the literal string "{{ execution_date.strftime('%H') }}" so the flow would always follow feed_sensor_b . Try this instead:没有输入传递到 lambda function 和python_callable不是操作员的模板化字段(即逻辑正在评估文字字符串"{{ execution_date.strftime('%H') }}" ,因此流程将始终按照feed_sensor_b . 试试这个:

branch_op = BranchPythonOperator(
    task_id="branch_op",
    python_callable=lambda execution_date_hour: "feed_sensor_a" if execution_date_hour == "5" else "feed_sensor_b",
    op_args=[execution_date_hour],
    dag=dag,
)

I cannot reproduce failure in task branch_op .我无法重现任务branch_op中的失败。 However, even if it was running, it was always going to else condition because BranchPythonOperator does not have execution_date in template field list automatically.但是,即使它正在运行,它也总是处于else条件,因为BranchPythonOperator没有自动在模板字段列表中包含execution_date So I did two things,所以我做了两件事,

  • Reformatted DAG a bit重新格式化 DAG
  • Updated condition to check for 05 because %H transates to this format更新条件以检查05 ,因为%H转换为这种格式
  • Used argument op_kwargs of BranchPythonOperator to pass hour.使用BranchPythonOperator op_kwargs传递小时。
from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import BranchPythonOperator
from datetime import datetime, timedelta

execution_date_hour = "{{ execution_date.strftime('%H') }}"
failure_email_list = ['myemail']

default_args = {
    'owner': 'hourly-airflow',
    'depends_on_past': False,
    'catch_up': False,
    'start_date': datetime(2020, 12, 14, 0, 0),
    'email': failure_email_list,
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 3,
}

dag = DAG('hourly_pipeline_dag',
          default_args=default_args,
          tags=['hourly'],
          schedule_interval='@hourly',
          catchup=False)

with dag:
    taskA = DummyOperator(task_id='taskA')
    taskB = DummyOperator(task_id='taskB')
    taskC = DummyOperator(task_id='taskC')
    taskD = DummyOperator(task_id='taskD')
    taskE = DummyOperator(task_id='taskE')

    branch_op = BranchPythonOperator(
        task_id= f'branch_op',
        python_callable=lambda hour: 'feed_sensor_5' if hour == "05" else 'feed_sensor_not5',
        op_kwargs={'hour': execution_date_hour},
    )
    feed_sensor_5 = DummyOperator(task_id='feed_sensor_5')
    feed_sensor_not5 = DummyOperator(task_id='feed_sensor_not5')

    taskA >> [taskB,taskC]
    taskB >> taskD
    taskC >> taskE
    [taskD,taskE] >> branch_op
    branch_op >> [feed_sensor_5,feed_sensor_not5] 

Now it goes to if for execution 2021-05-01 05:00:00 and to else for 2021-05-01 06:00:00现在它转到 if 执行2021-05-01 05:00:00和 else 执行2021-05-01 06:00:00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM