简体   繁体   English

基于 dag_run conf 值的循环中的气流任务

[英]Airflow tasks in a loop based on dag_run conf value

I am trying to create multiple airflow tasks based on dag_run conf input.我正在尝试根据 dag_run conf 输入创建多个气流任务。 The conf would have an array of values and the each value needs to spawn a task. conf 将有一个值数组,每个值都需要生成一个任务。 The task in turn needs to pass the value to its callable func.任务又需要将值传递给它的可调用函数。 Something like this:像这样:

 #create this task in a loop
 task = PythonOperator(task_id="fetch_data", python_callable=fetch_data(value from array), retries=10)

Conf would have a value like: Conf 的值如下:

{"fruits":["apple","kiwi","orange"]}

I think this can be accessed with:我认为这可以通过以下方式访问:

kwargs['dag_run'].conf('fruits')

How do I access this value outside an operator and then create operators in a loop?如何在运算符外部访问此值,然后在循环中创建运算符?

You can wrap your PythonOperator instantiation in a for loop that consumes the list of values.您可以将 PythonOperator 实例化包装在使用值列表的 for 循环中。

from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator

from datetime import datetime

dag = DAG(
    dag_id='fruit_name_printer',
    start_date=datetime(2021, 1, 1),
    schedule_interval='@once'
)

input = [
    'apple',
    'orange',
    'banana'
]


def call_func(fruit_name):
    print(fruit_name)


with dag:
    for fruit in input:
        printer = PythonOperator(
            task_id=f'print_{fruit}',
            python_callable=call_func,
            op_kwargs={
                'fruit_name': fruit
            }
        )

I wish there was some sort of parallel for operator in airflow like kubeflow has.我希望 airflow 中的运算符有某种并行,就像 kubeflow 一样。 To solve this problem using airflow, I end up triggering another DAG/Dag Run using the TriggerDagRunOperator... it looks something like this.为了使用 airflow 解决这个问题,我最终使用 TriggerDagRunOperator 触发了另一个 DAG/Dag Run……它看起来像这样。

from airflow.operators.dagrun_operator import TriggerDagRunOperator

def trigger_extract_dag(**kwargs):
    config = kwargs['dag_run'].conf

    for stuff_dict in config['stuff_to_extract']:
        dag_task = TriggerDagRunOperator(
            task_id='trigger-extraction-dag',
            trigger_dag_id=EXTRACTION_DAG_NAME,
            conf=stuff_dict
        )
        dag_task.execute(dict())

Had the same question.有同样的问题。 My workaround on that is using Airflow variables.我的解决方法是使用 Airflow 变量。

from airflow.models import Variable

foo_json = Variable.get("foo_baz", deserialize_json=True)

with DAG (...) as dag:
    for x in foo_json['task_list']:
        t1 = PythonOperator(
                 task_id=f'task_{x}', ...)

The content is a list which can be used to create dynamic amount of tasks with for loop inside a Dag.内容是一个列表,可用于在 Dag 中使用 for 循环创建动态数量的任务。 Every time I need different config for tasks I simply change it right before a Dagrun.每次我需要不同的任务配置时,我只需在 Dagrun 之前更改它。 It's not done with one command but with two to reach that functionality regardless on manual, api or CMD usage.它不是通过一个命令完成的,而是通过两个命令来实现该功能的,而不管手动、api 或 CMD 使用情况如何。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM