简体   繁体   English

如何设置两名工人在气流中

[英]How to setup two workers in airflow

I have two workers and 3 tasks. 我有两个工人和三个任务。

dag = DAG('dummy_for_testing', default_args=default_args)

t1 = BashOperator(
    task_id='print_task1',
    bash_command='task1.py',
    dag=dag)

t2 = BashOperator(
    task_id='print_task2',
    bash_command='task2.py',
    dag=dag)

t3 = BashOperator(
    task_id='print_task3',
    bash_command='task3.py',
    dag=dag)

t1 >> t2 >> t3

Let say, I am performing tasks(t1,t2,t3) on a particular file. 假设我正在执行特定文件上的tasks(t1,t2,t3) Currently, everything is working on one worker but I want to setup another worker that will take the output of first task and perform task t2 and then task t3. 当前,所有工作都在一个工作程序上进行,但是我想设置另一个工作程序,该工作程序将接收第一个任务的输出并执行任务t2,然后执行任务t3。 So that, queue1 will perform t1 for the next file. 这样, queue1将对下一个文件执行t1 How can I make this work for two workers. 我该如何为两名工人做这项工作。 I am thinking of using queues but couldn't understand how to make queue2 wait until task t1 in queue1 finished. 我正在考虑使用queues但是不明白如何使queue1等到queue2任务t1完成。

You shouldn't have to do anything other than start both workers, they will pick up tasks as they become available and within the concurrency/parallelism constraints defined in your config. 除了启动这两个工作程序外,您无需执行任何其他操作,它们将在任务可用时并在配置中定义的并发/并行性约束下拾取任务。

In the example you gave, the tasks might run entirely one worker 1 , worker 2 , or a mixture of both. 在您给出的示例中,任务可能完全运行一个worker 1worker 2或两者的混合运行。 This is because t2 won't start until t1 has completed. 这是因为t2直到t1完成才开始。 In the time between t1 completing and t2 starting, both workers will be idle (assuming you don't have other dags running). t1完成和t2开始之间的时间内,两个工作人员都将处于闲置状态(假设您没有其他dag在运行)。 One will win the race in reserving the t2 task to run. 在保留要运行的t2任务中将赢得比赛。

If you needed to have specific tasks running on different workers, (say to have one or more workers with higher levels of resources available, or special hardware) you can specify the queue at task level. 如果您需要在不同的工作线程上运行特定的任务(例如,让一个或多个工作线程具有更高的可用资源级别或特殊的硬件),则可以在任务级别指定队列。 The queue won't make a difference in the order that tasks run as the Airflow scheduler will ensure a task doesn't run until the tasks upstream to it have been successfully ran. 队列不会影响任务的运行顺序,因为Airflow调度程序将确保任务没有运行,直到上游的任务成功运行为止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM