简体   繁体   English

如何在 Airflow 工作人员上执行普通的 Celery 任务

[英]How to execute plain Celery tasks on Airflow workers

I currently have Airflow set up and working correctly using the CeleryExecutor as a backend to provide horizontal scaling.我目前有 Airflow 设置并使用 CeleryExecutor 作为后端正常工作以提供水平缩放。 This works remarkably well especially when having the worker nodes sit in an autoscaling group on EC2.这非常有效,尤其是当工作节点位于 EC2 上的自动缩放组中时。

In addition to Airflow, I use plain Celery to handle simple asynchronous tasks (that don't need a whole pipeline) coming from Flask/Python.除了 Airflow 之外,我还使用普通的 Celery 来处理来自 Flask/Python 的简单异步任务(不需要整个管道)。 Until now, these plain Celery tasks were very low volume and I just ran the plain Celery worker on the same machine as Flask.到目前为止,这些普通的 Celery 任务的容量非常小,我只是在与 Flask 相同的机器上运行普通的 Celery 工作程序。 There is now a requirement to run a massive number of plain Celery tasks in the system, so I need to scale my plain Celery as well.现在需要在系统中运行大量普通的 Celery 任务,所以我还需要扩展我的普通 Celery 任务。

One way to do this would be to run the plain Celery worker service on the Airflow worker servers as well (to benefit from the autoscaling etc.) but this doesn't seem to be an elegant solution since it creates two different "types" of Celery worker on the same machine.一种方法是在 Airflow 工作服务器上运行普通的 Celery 工作服务(以受益于自动缩放等),但这似乎不是一个优雅的解决方案,因为它创建了两种不同的“类型” Celery 工人在同一台机器上。 My question is whether there is some combination of configuration settings I can pass to my plain Celery app that will cause @celery.task decorated functions to be executed directly on my Airflow worker cluster as a plain Celery task, completely bypassing the Airflow middleware. My question is whether there is some combination of configuration settings I can pass to my plain Celery app that will cause @celery.task decorated functions to be executed directly on my Airflow worker cluster as a plain Celery task, completely bypassing the Airflow middleware.

Thanks for the help.谢谢您的帮助。

The application is airflow.executors.celery_executor.app if I remember well.如果我没记错的话,该应用程序是airflow.executors.celery_executor.app Try celery -A airflow.executors.celery_executor.app inspect active for an example in your current Airflow infrastructure to test it.尝试celery -A airflow.executors.celery_executor.app inspect active的示例以对其进行测试。 However, I suggest you do not do this because your Celery tasks may affect the execution of Airflow DAGs, and it may affect the SLAs.但是,我建议您不要这样做,因为您的 Celery 任务可能会影响 Airflow DAG 的执行,并且可能会影响 SLA。

What we do in the company I work for is exactly what you suggested - we maintain a large Celery cluster, and we sometimes offload execution of some Airflow tasks to our Celery cluster, depending on the use-case.我们在我工作的公司所做的正是您所建议的 - 我们维护一个大型 Celery 集群,有时我们会将一些 Airflow 任务的执行卸载到我们的 Z31D15D604F300D27A93591A35ACE08E 集群,具体取决于用例 4 集群。 This is particularly handy when a task in our Airflow DAG actually triggers tens of thousands of small jobs.当我们的 Airflow DAG 中的任务实际上触发了数以万计的小作业时,这特别方便。 Our Celery cluster runs 8 million tasks on a busy day.我们的 Celery 集群在繁忙的一天运行 800 万个任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM