简体   繁体   English

如何在 Airflow 中的单次运行中运行相同的 dag 两次

[英]How to run same dag two times in a single run in Airflow

I am absolutely new to Airflow.我对 Airflow 完全陌生。 I have one requirement where I have to run two EMR jobs.我有一个要求,我必须运行两个 EMR 作业。 . . Currently I have a python script which depends on some input files, if present it triggers a EMR job.目前我有一个依赖于一些输入文件的 python 脚本,如果存在它会触发一个 EMR 作业。

My new requirement is, I will be having to different input files(same type) and these two files will be input to the emr jobs, in both of this two cases the spark will do the same thing but only the input file are different.我的新要求是,我将不得不使用不同的输入文件(相同类型),这两个文件将被输入到 emr 作业中,在这两种情况下,spark 会做同样的事情,但只有输入文件不同。

create_job_workflow = EmrCreateJobFlowOperator(
    task_id='some-task',
    job_flow_overrides=job_flow_args,
    aws_conn_id=aws_conn,
    emr_conn_id=emr_conn,
    dag=dag
)

Ho can I achieve this to run two same dag run by only changing the input file inside spark-submit , basically whenever I will do ' trigger DAG ' it will take two different input files and trigger two different emr jobs in two different emr cluster.我可以通过仅更改spark-submit 中的输入文件来实现这一点来运行两个相同的 dag 运行,基本上每当我执行“触发 DAG ”时,它将采用两个不同的输入文件并在两个不同的 emr 集群中触发两个不同的 emr 作业。 Or can you any one please provide me some best practice to do it?或者您可以为我提供一些最佳实践吗? Or any how is it possible by altering the max_active_runs =2或者通过改变max_active_runs =2 怎么可能

Best practice will be to have two different tasks for it.最佳实践是为其设置两个不同的任务。 by setting max_active_runs=2 you will just limit the number of concurrent dag_runs to 2. You can take help of any data structure to set the config for your tasks, iterate over it and build the tasks based on each attribute.通过设置 max_active_runs=2,您只需将并发 dag_runs 的数量限制为 2。您可以借助任何数据结构来为您的任务设置配置,对其进行迭代并基于每个属性构建任务。

Another thing you can do:你可以做的另一件事:

You can receive the filename as the payload of your dag Access it like: context['dag_run'].conf.get('filename')您可以接收文件名作为 dag 的有效负载访问它,例如:context['dag_run'].conf.get('filename')

And retrigger the same dag with a trigger dag_run operator, updating the desired payload with the other file并使用触发器 dag_run 操作符重新触发相同的 dag,使用另一个文件更新所需的有效负载

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM