简体   繁体   中英

how to parallelize similar BashOperator tasks but different parameters in an Airflow DAG

I have parallel execution of 2 tasks below in my DAG In the real world these could be 15 or 20 tasks with the input parameters coming from an array, like below.

fruits = ["apples", "bananas"]

bad_dag = DAG('bad_dag_3', default_args=default_args, schedule_interval=None)

t0=BashOperator(
    task_id="print",
    bash_command='echo "Beginning parallel tasks next..." ',
    dag=bad_dag)

t1=BashOperator(
    task_id="fruit_"+fruits[0],
    params={"fruits": fruits}, 
    bash_command='echo fruit= {{params.fruits[0]}} ',
    dag=bad_dag)

t2=BashOperator(
    task_id="fruit_"+fruits[1],
    params={"fruits": fruits},
    bash_command='echo fruit= {{params.fruits[1]}} ',
    dag=bad_dag)

t0>>[t1, t2]

Whats the best way for me to write this DAG, so I dont have to re-write the same BashOperator over and over again like I have above.

I cannot use a loop because I cannot parallelize the tasks if I use a loop.

Use the below DAG. The idea is that the task_id for each task should be unique, airflow will handle the rest.

fruits = ["apples", "bananas"]

bad_dag = DAG('bad_dag_3', default_args=default_args, schedule_interval=None)

t0=BashOperator(
    task_id="print",
    bash_command='echo "Beginning parallel tasks next..." ',
    dag=bad_dag)

for fruit in fruits:
    task_t = BashOperator(
        task_id="fruit_"+fruit,
        params={"fruit": fruit},
        bash_command='echo fruit= {{params.fruit}} ',
        dag=bad_dag)

    t0 >> task_t

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM