[英]BranchOperator is getting skipped airflow
I have this flow:-我有这个流程: -
execution_date_hour = "{{ execution_date.strftime('%H') }}"
default_args = {
'owner': 'hourly-airflow',
'depends_on_past': False,
'catch_up': False,
'start_date': days_ago(1),
'email': failure_email_list,
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=5)
}
dag = DAG('hourly_pipeline_dag',
default_args=default_args,
tags=['hourly'],
schedule_interval='@hourly',
catchup=False)
taskA = PostgresOperator(dag=dag,
task_id='taskA',
postgres_conn_id='database_connection',
sql='sql/hourly_entry.sql')
taskb = DummyOperator(
dag=dag,
task_id="taskb"
)
taske = DummyOperator(
dag=dag,
task_id="taske"
)
taskc = DummyOperator(
dag=dag,
task_id="taskc"
)
taskd = DummyOperator(
dag=dag,
task_id="taskd"
)
branch_op = BranchPythonOperator(
task_id='branch_op',
python_callable=lambda
**kwargs: 'feed_sensor_a' if execution_date_hour == '5' else 'feed_sensor_b',
dag=dag)
feed_sensor_a = SqlSensor(dag=dag,
task_id='feed_sensor_a',
conn_id='database_connection',
sql='sql/sensor_hourly.sql',
poke_interval=30,
trigger_rule=TriggerRule.ONE_SUCCESS,
timeout=3600)
feed_sensor_b = SqlSensor(dag=dag,
task_id='feed_sensor_b',
conn_id='database_connection',
sql='sql/sensor.sql',
poke_interval=30,
trigger_rule=TriggerRule.ONE_SUCCESS,
timeout=3600)
taskA >> [taskb,taskc]
taskb >> taskd
taskc >> taske
[taskd,taske] >> branch_op
branch_op >> [feed_sensor_a,feed_sensor_b]
Pipeline runs till taskd and taske, branch_op is skipped.管道一直运行到 taskd 和 taske,branch_op 被跳过。 Please help, I am stuck on this for soo long.
请帮忙,我被困在这个问题上太久了。 Till taske and taskd it runs fine, branch_op is highlighted in red ie skipped, don't know what happens here.
直到 taske 和 taskd 运行良好,branch_op 以红色突出显示,即跳过,不知道这里发生了什么。 (All these tasks are dummy tasks, in actual they are HttpOperator and Postgres op).
(所有这些任务都是虚拟任务,实际上它们是 HttpOperator 和 Postgres op)。 Thanks in advance, let me know if any other info is required.
提前感谢,如果需要任何其他信息,请告诉我。
Running your code I don't see the branch_op
task failing or being skipped.运行您的代码我没有看到
branch_op
任务失败或被跳过。 However, I don't think your BranchPythonOperator
task will work as you'd like it to.但是,我认为您的
BranchPythonOperator
任务不会像您希望的那样工作。 There are no inputs being passed into the lambda function and python_callable
is not a templated field for the operator (ie the logic is evaluating to the literal string "{{ execution_date.strftime('%H') }}"
so the flow would always follow feed_sensor_b
. Try this instead:没有输入传递到 lambda function 和
python_callable
不是操作员的模板化字段(即逻辑正在评估文字字符串"{{ execution_date.strftime('%H') }}"
,因此流程将始终按照feed_sensor_b
. 试试这个:
branch_op = BranchPythonOperator(
task_id="branch_op",
python_callable=lambda execution_date_hour: "feed_sensor_a" if execution_date_hour == "5" else "feed_sensor_b",
op_args=[execution_date_hour],
dag=dag,
)
I cannot reproduce failure in task branch_op
.我无法重现任务
branch_op
中的失败。 However, even if it was running, it was always going to else
condition because BranchPythonOperator
does not have execution_date
in template field list automatically.但是,即使它正在运行,它也总是处于
else
条件,因为BranchPythonOperator
没有自动在模板字段列表中包含execution_date
。 So I did two things,所以我做了两件事,
05
because %H
transates to this format05
,因为%H
转换为这种格式op_kwargs
of BranchPythonOperator
to pass hour.BranchPythonOperator
op_kwargs
传递小时。from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import BranchPythonOperator
from datetime import datetime, timedelta
execution_date_hour = "{{ execution_date.strftime('%H') }}"
failure_email_list = ['myemail']
default_args = {
'owner': 'hourly-airflow',
'depends_on_past': False,
'catch_up': False,
'start_date': datetime(2020, 12, 14, 0, 0),
'email': failure_email_list,
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
}
dag = DAG('hourly_pipeline_dag',
default_args=default_args,
tags=['hourly'],
schedule_interval='@hourly',
catchup=False)
with dag:
taskA = DummyOperator(task_id='taskA')
taskB = DummyOperator(task_id='taskB')
taskC = DummyOperator(task_id='taskC')
taskD = DummyOperator(task_id='taskD')
taskE = DummyOperator(task_id='taskE')
branch_op = BranchPythonOperator(
task_id= f'branch_op',
python_callable=lambda hour: 'feed_sensor_5' if hour == "05" else 'feed_sensor_not5',
op_kwargs={'hour': execution_date_hour},
)
feed_sensor_5 = DummyOperator(task_id='feed_sensor_5')
feed_sensor_not5 = DummyOperator(task_id='feed_sensor_not5')
taskA >> [taskB,taskC]
taskB >> taskD
taskC >> taskE
[taskD,taskE] >> branch_op
branch_op >> [feed_sensor_5,feed_sensor_not5]
Now it goes to if for execution 2021-05-01 05:00:00
and to else for 2021-05-01 06:00:00
现在它转到 if 执行
2021-05-01 05:00:00
和 else 执行2021-05-01 06:00:00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.