[英]Airflow tasks in a loop based on dag_run conf value
I am trying to create multiple airflow tasks based on dag_run conf input.我正在尝试根据 dag_run conf 输入创建多个气流任务。 The conf would have an array of values and the each value needs to spawn a task.
conf 将有一个值数组,每个值都需要生成一个任务。 The task in turn needs to pass the value to its callable func.
任务又需要将值传递给它的可调用函数。 Something like this:
像这样:
#create this task in a loop
task = PythonOperator(task_id="fetch_data", python_callable=fetch_data(value from array), retries=10)
Conf would have a value like: Conf 的值如下:
{"fruits":["apple","kiwi","orange"]}
I think this can be accessed with:我认为这可以通过以下方式访问:
kwargs['dag_run'].conf('fruits')
How do I access this value outside an operator and then create operators in a loop?如何在运算符外部访问此值,然后在循环中创建运算符?
You can wrap your PythonOperator instantiation in a for loop that consumes the list of values.您可以将 PythonOperator 实例化包装在使用值列表的 for 循环中。
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
dag = DAG(
dag_id='fruit_name_printer',
start_date=datetime(2021, 1, 1),
schedule_interval='@once'
)
input = [
'apple',
'orange',
'banana'
]
def call_func(fruit_name):
print(fruit_name)
with dag:
for fruit in input:
printer = PythonOperator(
task_id=f'print_{fruit}',
python_callable=call_func,
op_kwargs={
'fruit_name': fruit
}
)
I wish there was some sort of parallel for operator in airflow like kubeflow has.我希望 airflow 中的运算符有某种并行,就像 kubeflow 一样。 To solve this problem using airflow, I end up triggering another DAG/Dag Run using the TriggerDagRunOperator... it looks something like this.
为了使用 airflow 解决这个问题,我最终使用 TriggerDagRunOperator 触发了另一个 DAG/Dag Run……它看起来像这样。
from airflow.operators.dagrun_operator import TriggerDagRunOperator
def trigger_extract_dag(**kwargs):
config = kwargs['dag_run'].conf
for stuff_dict in config['stuff_to_extract']:
dag_task = TriggerDagRunOperator(
task_id='trigger-extraction-dag',
trigger_dag_id=EXTRACTION_DAG_NAME,
conf=stuff_dict
)
dag_task.execute(dict())
Had the same question.有同样的问题。 My workaround on that is using Airflow variables.
我的解决方法是使用 Airflow 变量。
from airflow.models import Variable
foo_json = Variable.get("foo_baz", deserialize_json=True)
with DAG (...) as dag:
for x in foo_json['task_list']:
t1 = PythonOperator(
task_id=f'task_{x}', ...)
The content is a list which can be used to create dynamic amount of tasks with for loop inside a Dag.内容是一个列表,可用于在 Dag 中使用 for 循环创建动态数量的任务。 Every time I need different config for tasks I simply change it right before a Dagrun.
每次我需要不同的任务配置时,我只需在 Dagrun 之前更改它。 It's not done with one command but with two to reach that functionality regardless on manual, api or CMD usage.
它不是通过一个命令完成的,而是通过两个命令来实现该功能的,而不管手动、api 或 CMD 使用情况如何。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.