简体   繁体   English

使用并发并发的 Celery 串行处理特定任务

[英]Serial processing of specific tasks using Celery with concurrency

I have a python/celery setup: I have a queue named "task_queue" and multiple python scripts that feed it data from different sensors.我有一个 python/celery 设置:我有一个名为“task_queue”的队列和多个 python 脚本,它们从不同的传感器提供数据。 There is a celery worker that reads from that queue and sends an alarm to user if the sensor value changed from high to low.如果传感器值从高变为低,则有一个芹菜工人从该队列中读取数据并向用户发送警报。 The worker has multiple threads (I have autoscaling parameter enabled) and everything works fine until one sensor decides to send multiple messages at once.工作线程有多个线程(我启用了自动缩放参数)并且一切正常,直到一个传感器决定一次发送多条消息。 That's when I get the race condition and may send multiple alarms to user, since before a thread stores the info that it had already sent an alarm, few other threads also send it.那是我获得竞争条件并可能向用户发送多个警报的时候,因为在线程存储它已经发送警报的信息之前,很少有其他线程也会发送它。

I have n sensors (n can be more than 10000) and messages from any sensor should be processed sequentially.我有 n 个传感器(n 可以超过 10000),并且应该按顺序处理来自任何传感器的消息。 So in theory I could have n threads, but that would be an overkill.所以理论上我可以有 n 个线程,但这会有点矫枉过正。 I'm looking for a simplest way to equally distribute the messages across x threads (usually 10 or 20), so I wouldn't have to (re)write routing function and define new queues each time I want to increase x (or decrease).我正在寻找一种最简单的方法来在 x 个线程(通常为 10 或 20 个)之间平均分配消息,因此每次我想增加 x(或减少)时,我都不必(重新)编写路由函数并定义新队列)。

So is it possible to somehow mark the tasks that originate from same sensor to be executed in serial manner (when calling the delay or apply_async)?那么是否有可能以某种方式标记源自同一传感器的任务以串行方式执行(在调用延迟或 apply_async 时)? Or is there a different queue/worker architecture I should be using to achieve that?或者我应该使用不同的队列/工作器架构来实现这一目标吗?

From what I understand, you have some tasks that can run all at the same time and a specific task that can not do this (this task needs to be executed 1 at a time).据我了解,您有一些任务可以同时运行,而特定任务则不能这样做(此任务需要一次执行 1 个)。

There is no way (for now) to set the concurrency of a specific task queue so I think the best approach in your situation would be handling the problem with multiple workers.没有办法(目前)设置特定任务队列的并发性,所以我认为在您的情况下最好的方法是处理多个工作人员的问题。

Lets say you have the following queues:假设您有以下队列:

  • queue_1 Here we send tasks that can run all at the same time queue_1这里我们发送可以同时运行的任务
  • queue_2 Here we send tasks that can run 1 at a time. queue_2这里我们发送一次可以运行 1 个的任务。

You could start celery with the following commands (If you want them in the same machine).您可以使用以下命令启动 celery(如果您希望它们在同一台机器上)。

celery -A proj worker --loglevel=INFO --concurrency=10 -n worker1@%h -Q queue_1
celery -A proj worker --loglevel=INFO --concurrency=1 -n worker2@%h -Q queue_2

This will make worker1 which has concurrency 10 handle all tasks that can be ran at the same time and worker2 handles only the tasks that need to be 1 at a time.这将使具有并发性 10 的worker1处理所有可以同时运行的任务,而worker2一次worker2处理需要为 1 的任务。

Here is some documentation reference:https://docs.celeryproject.org/en/stable/userguide/workers.html这是一些文档参考:https ://docs.celeryproject.org/en/stable/userguide/workers.html

NOTE: Here you will need to specify the task in which queue runs.注意:在这里您需要指定队列运行的任务。 This can be done when calling with apply_async , directly from the decorator or some other ways.这可以在使用apply_async直接从装饰器或其他方式调用时完成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM