简体   繁体   English

气流,标记任务成功或在dag运行之前跳过它

[英]Airflow, mark a task success or skip it before dag run

We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. 我们有一个巨大的DAG,有许多小而快的任务和一些大而耗时的任务。

We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. 我们只想运行DAG的一部分,我们发现的最简单方法是不添加我们不想运行的任务。 The problem is that our DAG has many co-dependencies, so it became a real challenge to not broke the dag when we want to skip some tasks. 问题是我们的DAG有很多共同依赖关系,因此当我们想要跳过某些任务时,不要打破dag成为一个真正的挑战。

Its there a way to add a status to the task by default? 有没有办法在默认情况下为任务添加状态? (for every run), something like: (对于每次运行),类似于:

# get the skip list from a env variable    
task_list = models.Variable.get('list_of_tasks_to_skip')

dag.skip(task_list)

or 要么

for task in task_list:
    task.status = 'success'

As mentioned in the comments, you should use the BranchPythonOperator (or ShortCircuitOperator ) to prevent the time-consuming tasks from executing. 如评论中所述,您应该使用BranchPythonOperator (或ShortCircuitOperator )来防止执行耗时的任务。 If you need downstream operators of these time-consuming tasks to run, you can use the TriggerRule.ALL_DONE to have those operators run, but note this will run even when the upstream operators fail. 如果您需要运行这些耗时任务的下游运算符,可以使用TriggerRule.ALL_DONE运行这些运算符,但请注意,即使上游运算符失败,它也会运行。

You can use Airflow Variables to affect these BranchPythonOperators without having to update the DAG, eg: 您可以使用Airflow Variables来影响这些BranchPythonOperators而无需更新DAG,例如:

from airflow.models import Variable

def branch_python_operator_callable()
  return Variable.get('time_consuming_operator_var')

and use branch_python_operator_callable as the Python callable for your BranchPythonOperator. 并使用branch_python_operator_callable作为branch_python_operator_callable的Python可调用对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM