简体   繁体   English

如何使用 Airflow 多次运行相同的 Python 脚本?

[英]How to run the same Python script multiple times using Airflow?

I'm trying to run my Differential Evolution script in Python called differential_evolution.py .我正在尝试在 Python 中运行我的差分进化脚本,称为different_evolution.py Each iteration runs for around 40 generations.每次迭代运行大约 40 代。 I want to run 50 iterations in parallel using Airflow.我想使用 Airflow并行运行 50 次迭代。 I have provided random seed in my script so that each iteration creates different results.我在脚本中提供了随机种子,这样每次迭代都会产生不同的结果。

Snippet of differential_evolution.py : different_evolution.py的片段:

Optimizer() is a custom class I created to run the algorithm. Optimizer()是我为运行算法而创建的自定义类。 solution stores the solution list in list attribute x .解决方案将解决方案列表存储在列表属性x中。 And mape calculates the mape for the solution list x .并且mape计算解决方案列表x的 mape。

for iteration in range(50):
    seed = np.random.randint(0, 1000)  
    opt_obj = Optimizer()  
    solution = opt_obj.run_optimizer()  
    mape = opt_obj.calc_performance(solution.x)

Each iteration creates two output files: abc.txt and xyz.csv to store relevant information for different variables.每次迭代都会创建两个输出文件: abc.txtxyz.csv来存储不同变量的相关信息。

Snippet of the dag script : dag 脚本的片段:

start >> create_cluster >> differential_evolution.py >> delete_cluster >> end开始 >> create_cluster >> different_evolution.py >> delete_cluster >> 结束

This is running fine but taking lots of time when you run for 50 iterations.这运行良好,但在运行 50 次迭代时会花费大量时间。


What I want is to create a dag like this:我想要的是创建一个这样的 dag:

start >> create_cluster >> [iteration 1, iteration 2, ... iteration 50] >> delete_cluster >> end , where each iteration outputs the same two files abc_ i .txt and xyz_ i .csv ( i is the ith iteration) start >> create_cluster >> [iteration 1, iteration 2, ... iteration 50] >> delete_cluster >> end ,其中每次迭代输出相同的两个文件 abc_ i .txt 和 xyz_ i .csv ( i是第 i 次迭代)

You could do something like this:你可以这样做:

@task
def task1():
    something here

@task
def task2():
    something here

@task
def task_optimizer():
   seed = np.random.randint(0, 1000)  
   opt_obj = Optimizer()  
   solution = opt_obj.run_optimizer()  
   mape = opt_obj.calc_performance(solution.x)


start_task=task1()
end_task=task2()

for iteration in range(50):
    optimize_task = task_optimizer()
    start_task >> optimize_task >> end_task

    

You would need to adjust so the optimizer generates random numbers properly but something along this line should work.您需要进行调整,以便优化器正确生成随机数,但沿着这条线的东西应该可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM