[英]Use existing celery workers for Airflow's Celeryexecutor workers
I am trying to introduce dynamic workflows into my landscape that involves multiple steps of different model inference where the output from one model gets fed into another model.Currently we have few Celery workers spread across hosts to manage the inference chain. I am trying to introduce dynamic workflows into my landscape that involves multiple steps of different model inference where the output from one model gets fed into another model.Currently we have few Celery workers spread across hosts to manage the inference chain. As the complexity increase, we are attempting to build workflows on the fly.
随着复杂性的增加,我们正在尝试动态构建工作流。 For that purpose, I got a dynamic DAG setup with Celeryexecutor working.
为此,我使用 Celeryexecutor 进行了动态 DAG 设置。 Now, is there a way I can retain the current Celery setup and route airflow driven tasks to the same workers?
现在,有没有办法可以保留当前的 Celery 设置并将 airflow 驱动的任务路由给相同的工作人员? I do understand that the setup in these workers should have access to the DAG folders and environment same as the airflow server.
我确实了解这些工作人员中的设置应该可以访问与 airflow 服务器相同的 DAG 文件夹和环境。 I want to know how the celery worker need to be started in these servers so that airflow can route the same tasks that used to be done by the manual workflow from a python application.
我想知道如何在这些服务器中启动 celery 工作程序,以便 airflow 可以路由以前通过 python 应用程序的手动工作流完成的相同任务。 If I start the workers using command "airflow celery worker", I cannot access my application tasks.
如果我使用命令“airflow celery worker”启动工作程序,我将无法访问我的应用程序任务。 If I start celery the way it is currently ie "celery -A proj", airflow has nothing to do with it.
如果我以目前的方式启动 celery,即“celery -A proj”,则 airflow 与它无关。 Looking for ideas to make it work.
寻找使其工作的想法。
Thanks @DejanLekic.谢谢@DejanLekic。 I got it working (though the DAG task scheduling latency was too much that I dropped the approach).
我让它工作了(尽管 DAG 任务调度延迟太大,我放弃了这种方法)。 If someone is looking to see how this was accomplished, here are few things I did to get it working.
如果有人想看看这是如何完成的,我做了几件事来让它工作。
Note: It is not necessary to split the send_task() and.get() between two tasks of the DAG.注意:没有必要在 DAG 的两个任务之间拆分 send_task() 和 .get()。 By splitting them between parent and child, I was trying to take advantage of the lag between tasks.
通过将它们分配给父母和孩子,我试图利用任务之间的滞后。 But in my case, the celery execution of tasks completed faster than airflow's inherent latency in scheduling dependent tasks.
但在我的例子中,celery 的任务执行完成的速度比气流在调度相关任务时的固有延迟要快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.