简体   繁体   English

如何从气流中的任务动态生成下游列表

[英]How to dynamically generate downstream list from task in airflow

I have a main task holding it's logic in this function我有一个主要任务在这个函数中保持它的逻辑

I'm not completely sure how to do this.我不完全确定如何做到这一点。 Maybe I need another task in between?也许我需要在两者之间做另一项任务? Any help is appreciated.任何帮助表示赞赏。 Thanks!谢谢!

If I understood correctly, you already have multiple tasks created, but you need to dynamically define which of them will follow downstream execution.如果我理解正确的话,您已经创建了多个任务,但是您需要动态定义其中哪些将跟随下游执行。 If that is the case you can safely use BranchPythonOperator :如果是这种情况,您可以安全地使用BranchPythonOperator

It derives the PythonOperator and expects a Python function that returns a single task_id or list of task_ids to follow.它派生 PythonOperator 并期望返回单个 task_id 或 task_id 列表的 Python 函数。 The task_id(s) returned should point to a task directly downstream from {self}.返回的 task_id(s) 应该直接指向 {self} 下游的任务。 All other "branches" or directly downstream tasks are marked with a state of skipped so that these paths can't move forward.所有其他“分支”或直接下游的任务都标记为跳过状态,以便这些路径无法向前移动。 The skipped states are propagated downstream to allow for the DAG state to fill up and the DAG run's state to be inferred.跳过的状态向下游传播以允许填充 DAG 状态并推断 DAG 运行的状态。

Consider the following based on the example_dag distributed with Airflow:基于与 Airflow 一起分发的example_dag考虑以下内容:

with DAG(
    dag_id="branch_multiple_tasks",
    default_args=args,
    start_date=days_ago(1),
    schedule_interval="@daily",
    tags=["example"],
) as dag:

    run_this_first = DummyOperator(
        task_id="run_this_first",
    )

    options = ["branch_a", "branch_b", "branch_c", "branch_d"]

    branching = BranchPythonOperator(
        task_id="branching",
        python_callable=lambda: options[1:3],
    )
    run_this_first >> branching

    join = DummyOperator(
        task_id="join",
        trigger_rule="none_failed_or_skipped",
    )

    for option in options:
        t = DummyOperator(
            task_id=option,
        )

        dummy_follow = DummyOperator(
            task_id="follow_" + option,
        )

        # Label is optional here, but it can help identify more complex branches
        branching >> Label(option) >> t >> dummy_follow >> join

In this example the python_callable passed to branching task is hardcoded to return ['branch_b', 'branch_c'] .在这个例子中,传递给branching任务的python_callable被硬编码为返回['branch_b', 'branch_c'] You could provide your own callable and return a list of tasks_id 's as string, based on any criteria.您可以根据任何条件提供自己的可调用对象并以字符串形式返回tasks_id列表。 You could even use your get_campaign_active function as long as you return the expected format.只要您返回预期的格式,您甚至可以使用get_campaign_active函数。 Maybe it's cleaner if you create a new function and perform xcom_pull from the previous one.如果您创建一个新函数并从前一个函数执行xcom_pull ,可能会更清晰。 I guess it's up to your needs.我想这取决于你的需求。

Graph view:图表视图:

graph_view_of_the_example

Let me know if that worked for you!如果这对你有用,请告诉我!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM