简体   繁体   English

将现有的 celery 工作人员用于 Airflow 的 Celeryexecutor 工作人员

[英]Use existing celery workers for Airflow's Celeryexecutor workers

I am trying to introduce dynamic workflows into my landscape that involves multiple steps of different model inference where the output from one model gets fed into another model.Currently we have few Celery workers spread across hosts to manage the inference chain. I am trying to introduce dynamic workflows into my landscape that involves multiple steps of different model inference where the output from one model gets fed into another model.Currently we have few Celery workers spread across hosts to manage the inference chain. As the complexity increase, we are attempting to build workflows on the fly.随着复杂性的增加,我们正在尝试动态构建工作流。 For that purpose, I got a dynamic DAG setup with Celeryexecutor working.为此,我使用 Celeryexecutor 进行了动态 DAG 设置。 Now, is there a way I can retain the current Celery setup and route airflow driven tasks to the same workers?现在,有没有办法可以保留当前的 Celery 设置并将 airflow 驱动的任务路由给相同的工作人员? I do understand that the setup in these workers should have access to the DAG folders and environment same as the airflow server.我确实了解这些工作人员中的设置应该可以访问与 airflow 服务器相同的 DAG 文件夹和环境。 I want to know how the celery worker need to be started in these servers so that airflow can route the same tasks that used to be done by the manual workflow from a python application.我想知道如何在这些服务器中启动 celery 工作程序,以便 airflow 可以路由以前通过 python 应用程序的手动工作流完成的相同任务。 If I start the workers using command "airflow celery worker", I cannot access my application tasks.如果我使用命令“airflow celery worker”启动工作程序,我将无法访问我的应用程序任务。 If I start celery the way it is currently ie "celery -A proj", airflow has nothing to do with it.如果我以目前的方式启动 celery,即“celery -A proj”,则 airflow 与它无关。 Looking for ideas to make it work.寻找使其工作的想法。

Thanks @DejanLekic.谢谢@DejanLekic。 I got it working (though the DAG task scheduling latency was too much that I dropped the approach).我让它工作了(尽管 DAG 任务调度延迟太大,我放弃了这种方法)。 If someone is looking to see how this was accomplished, here are few things I did to get it working.如果有人想看看这是如何完成的,我做了几件事来让它工作。

  1. Change the airflow.cfg to change the executor,queue and result back-end settings (Obvious)更改 airflow.cfg 以更改执行器、队列和结果后端设置(明显)
  2. If we have to use Celery worker spawned outside the airflow umbrella, change the celery_app_name setting to celery.execute instead of airflow.executors.celery_execute and change the Executor to "LocalExecutor". If we have to use Celery worker spawned outside the airflow umbrella, change the celery_app_name setting to celery.execute instead of airflow.executors.celery_execute and change the Executor to "LocalExecutor". I have not tested this, but it may even be possible to avoid switching to celery executor by registering airflow's Task in the project's celery App.我没有对此进行测试,但甚至可以通过在项目的 celery 应用程序中注册气流的任务来避免切换到 celery 执行器。
  3. Each task will now call send_task(), the AsynResult object returned is then stored in either Xcom(implicitly or explicitly) or in Redis(implicitly push to the queue) and the child task will then gather the Asyncresult ( it will be an implicit call to get the value from Xcom or Redis) and then call.get() to obtain the result from the previous step.每个任务现在将调用 send_task(),返回的 AsynResult object 然后存储在 Xcom(隐式或显式)或 Redis(隐式推送到队列)中,然后子任务将收集 Asyncresult(这将是一个隐式调用从 Xcom 或 Redis 中获取值)然后调用 .get() 以获取上一步的结果。

Note: It is not necessary to split the send_task() and.get() between two tasks of the DAG.注意:没有必要在 DAG 的两个任务之间拆分 send_task() 和 .get()。 By splitting them between parent and child, I was trying to take advantage of the lag between tasks.通过将它们分配给父母和孩子,我试图利用任务之间的滞后。 But in my case, the celery execution of tasks completed faster than airflow's inherent latency in scheduling dependent tasks.但在我的例子中,celery 的任务执行完成的速度比气流在调度相关任务时的固有延迟要快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM