简体   繁体   English

Airflow BranchPythonOperator 不遵循指定的分支

[英]Airflow BranchPythonOperator doesn't follow the specified branch

I have an AIRFLOW DAG with the following structure.我有一个具有以下结构的 AIRFLOW DAG。

气流图视图

All the functions that start with "check*" are BranchPythonOperator, and the function exceptionControl is a ExecuteDagRunOperator that receives every error in order to handle them.所有以“check*”开头的函数都是 BranchPythonOperator,而函数 exceptionControl 是一个 ExecuteDagRunOperator,它接收每个错误以处理它们。

This is the DAG configuration:这是 DAG 配置:

checkCloudFunctions = BranchPythonOperator(
    task_id='checkCloudFunctions',
    python_callable=check_cloud_functions,
    provide_context=True,
    dag=dag)

checkSqlTables = BranchPythonOperator(
    task_id='checkSqlTables',
    python_callable=check_sql_tables,
    provide_context=True,
    dag=dag)

checkBigQueryTable = BranchPythonOperator(
    task_id='checkBigQueryTable',
    python_callable=check_big_query_table,
    provide_context=True,
    dag=dag)

labBuilt = DummyOperator(
    task_id='labBuilt',
    dag=dag)

exceptionControl = ExecuteDagRunOperator(
    task_id='exceptionControl',
    execute_dag_id="SYS_exception_control",
    python_callable=mediation.dag_trigger_exception,
    trigger_rule='one_success',
    dag=dag)

# graphs
checkCloudFunctions >> checkSqlTables
checkCloudFunctions >> exceptionControl

checkSqlTables >> checkBigQueryTable
checkSqlTables >> exceptionControl

checkBigQueryTable >> labBuilt
checkBigQueryTable >> exceptionControl

The problem is that checkSqlTables should follow to exception control but it skips and the DAG ends.问题是checkSqlTables 应该遵循异常控制,但它会跳过并且 DAG 结束。 The function returns "exceptionControl" as we can see in the checkSqlTables log:正如我们在 checkSqlTables 日志中看到的那样,该函数返回“exceptionControl”:

   {base_task_runner.py:98} INFO - {python_operator.py:90} INFO - Done. Returned value was: exceptionControl
   {base_task_runner.py:98} INFO - {python_operator.py:118} INFO - Following branch exceptionControl
   {base_task_runner.py:98} INFO - {python_operator.py:119} INFO - Marking other directly downstream tasks as skipped
   {base_task_runner.py:98} INFO - {python_operator.py:128} INFO - Done.

I also played with the trigger_rule attribute (one_success, dummy...) but it doesn't seems to work.我还使用了trigger_rule属性(one_success,dummy...),但它似乎不起作用。

If I delete the first step, it seems to work, so it seems it should be some kind of configuration problem with my dag.如果我删除第一步,它似乎可以工作,所以我的 dag 似乎应该是某种配置问题。

在此处输入图片说明

Any ideas why the function checkSqlTables doesn't branch to exceptionControl?任何想法为什么函数 checkSqlTables 不分支到 exceptionControl?

EDIT: In a new deep reading to the Airflow Documentation I noticed that if a step mark a task as skipped, it will be skipped forever, so my code will never work with Branching Operators.编辑:在对气流文档新深入阅读中,我注意到如果一个步骤将任务标记为已跳过,它将永远被跳过,因此我的代码永远不会与分支运算符一起使用。

The solutions using branching consist on a dummy step before every step.使用分支的解决方案包括在每个步骤之前的虚拟步骤。 But I have some DAG that have more than 10 steps and the schema will be completely a chaos.但是我有一些 DAG 有 10 多个步骤,并且架构将完全混乱。

This seems to be a bug discussed in here https://github.com/apache/airflow/issues/10725这似乎是此处讨论的错误https://github.com/apache/airflow/issues/10725

Based on the above discussions, there is a fix merged here https://github.com/apache/airflow/pull/11120 .基于上述讨论,这里有一个修复合并在这里https://github.com/apache/airflow/pull/11120

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM