如何使用 Airflow 多次运行相同的 Python 脚本？

Question

I'm trying to run my Differential Evolution script in Python called differential_evolution.py .我正在尝试在 Python 中运行我的差分进化脚本，称为different_evolution.py 。 Each iteration runs for around 40 generations.每次迭代运行大约 40 代。 I want to run 50 iterations in parallel using Airflow.我想使用 Airflow并行运行 50 次迭代。 I have provided random seed in my script so that each iteration creates different results.我在脚本中提供了随机种子，这样每次迭代都会产生不同的结果。

Snippet of differential_evolution.py : different_evolution.py的片段：

Optimizer() is a custom class I created to run the algorithm. Optimizer()是我为运行算法而创建的自定义类。 solution stores the solution list in list attribute x .解决方案将解决方案列表存储在列表属性x中。 And mape calculates the mape for the solution list x .并且mape计算解决方案列表x的 mape。

for iteration in range(50):
    seed = np.random.randint(0, 1000)  
    opt_obj = Optimizer()  
    solution = opt_obj.run_optimizer()  
    mape = opt_obj.calc_performance(solution.x)

Each iteration creates two output files: abc.txt and xyz.csv to store relevant information for different variables.每次迭代都会创建两个输出文件： abc.txt和xyz.csv来存储不同变量的相关信息。

Snippet of the dag script : dag 脚本的片段：

start >> create_cluster >> differential_evolution.py >> delete_cluster >> end开始 >> create_cluster >> different_evolution.py >> delete_cluster >> 结束

This is running fine but taking lots of time when you run for 50 iterations.这运行良好，但在运行 50 次迭代时会花费大量时间。

What I want is to create a dag like this:我想要的是创建一个这样的 dag：

start >> create_cluster >> [iteration 1, iteration 2, ... iteration 50] >> delete_cluster >> end , where each iteration outputs the same two files abc_ i .txt and xyz_ i .csv ( i is the ith iteration) start >> create_cluster >> [iteration 1, iteration 2, ... iteration 50] >> delete_cluster >> end ，其中每次迭代输出相同的两个文件 abc_ i .txt 和 xyz_ i .csv （ i是第 i 次迭代）

Answer 1

You could do something like this:你可以这样做：

@task
def task1():
    something here

@task
def task2():
    something here

@task
def task_optimizer():
   seed = np.random.randint(0, 1000)  
   opt_obj = Optimizer()  
   solution = opt_obj.run_optimizer()  
   mape = opt_obj.calc_performance(solution.x)


start_task=task1()
end_task=task2()

for iteration in range(50):
    optimize_task = task_optimizer()
    start_task >> optimize_task >> end_task

You would need to adjust so the optimizer generates random numbers properly but something along this line should work.您需要进行调整，以便优化器正确生成随机数，但沿着这条线的东西应该可以工作。

如何使用 Airflow 多次运行相同的 Python 脚本？

问题描述

1 个解决方案

解决方案1
1 2022-07-19 21:39:05

如何使用 Airflow 多次运行相同的 Python 脚本？

问题描述

1 个解决方案

解决方案1 1 2022-07-19 21:39:05

解决方案1
1 2022-07-19 21:39:05