简体繁体 English

芹菜+ RabbitMQ变慢

[英]Celery + RabbitMQ becomes slow

原文 2018-10-12 13:52:09 6 1 python/ rabbitmq/ celery

I'm here with a performance issue that I can't seem to figure it out. 我在这里遇到一个性能问题，我似乎无法弄清楚。

The problem is that executing tasks is too slow. 问题是执行任务太慢。 Based on the celery log most of the tasks are finished under 0.3 seconds. 根据芹菜日志，大多数任务在0.3秒内完成。

I noticed that if I stop the workers and start them again the performance increases, almost up to 200 acks / second, then, after a while it becomes much slower, around 40/s. 我注意到，如果我停止工作并重新启动它们，性能会提高，几乎达到200 ack / s，然后过了一会儿，它变得非常慢，大约40 / s。

I'm not sure but I think it might be a broker issue rather than a celery issue. 我不确定，但我认为这可能是经纪人问题而不是芹菜问题。 Looking at the log of a couple of workers I noticed that they all seem to execute tasks, then stop for a bit and start again. 在查看几个工人的日志时，我注意到他们似乎都在执行任务，然后停下来稍作停留，然后重新开始。

It feels like receiving tasks is slow. 感觉接收任务很慢。

Any ideas about what might cause this ? 关于什么可能导致此的任何想法？ Thanks ! 谢谢！

A log example: 日志示例：

Task drones.tasks.blue_drone_process_task[64c0a826-aa18-4226-8a39-3a455e0916a5] succeeded in 0.18421914400005335s: None

10 seconds break 休息10秒

Received task: drones.tasks.blue_drone_process_task[924a1b99-670d-492e-94a1-91d5ff7142b9] Received task: drones.tasks.blue_drone_process_task[74a9a1d3-aa2b-40eb-9e5a-1420ea8b13d1] Received task: drones.tasks.blue_drone_process_task[99ae3ca1-dfa6-4854-a624-735fe0447abb] Received task: drones.tasks.blue_drone_process_task[dfbc0d65-c189-4cfc-b6f9-f363f25f2713]

IMO those tasks should execute so fast that I shouldn't be able to read the log. IMO这些任务应该执行得如此之快，以至于我无法读取日志。

My setup is: 我的设置是：

celery 4.2.1 芹菜4.2.1
RabbitMQ 3.7.8 兔子MQ 3.7.8
Erlang 21.1 Erlang 21.1

I use this setup for web scraping, have 2 queue. 我使用此设置进行网络抓取，有2个队列。 Let's call them Requests and Process. 我们称它们为“请求和流程”。

In the Requests queue I URLs that need to be scraped and in the Process queue will find the URL + source code of that page. 在“请求”队列中，我需要抓取的URL，在“处理”队列中，将找到该页面的URL +源代码。 (max 2.5 MB / source page, I drop it in case it's bigger than that), so all messages in the Process queue are max 2.5MB ± 1KB. （最大2.5 MB /源页面，如果大于该大小，我将其丢弃），因此Process队列中的所有消息最大2.5MB±1KB。

To execute tasks from the Requests queue I use celery with the gevent pool, concurrency 300. (-P gevent -c 300 --without-gossip --without-mingle --without-heartbeat). 要从请求队列执行任务，我将celery与gevent池并发300一起使用。（-P gevent -c 300 --without-gossip --without-mingle --without-heartbeat）。 4-8 workers like this. 4-8个工人是这样的。

To execute tasks from the Process queue I use the prefork pool (default). 为了从Process队列执行任务，我使用了prefork池（默认）。 (-c 4 --without-gossip --without-mingle --without-heartbeat). （-c 4 --without-gossip --without-mangle --without-heartbeat）。 30 workers like this. 像这样的30名工人。

Other setup info: 其他设置信息：

disabled heartbeats in celery and RabbitMQ, use TCP keep-alive 在芹菜和RabbitMQ中禁用心跳，使用TCP保持活动
everything is in AWS 一切都在AWS中
c4.xlarge instances for workers 工人的c4.xlarge实例
i3.xlarge for RabbitMQ (30GB RAM, 765 NVMe SSD, 4 cores) 适用于RabbitMQ的i3.xlarge（30GB RAM，765 NVMe SSD，4核）
haproxy for load balancing (I had 2 x RabbitMQ clustered for HA, fully replicated, stopped one thinking that might cause the issue, but I left the load balancer in case I decide to recreate the cluster) 用于负载平衡的haproxy（我有2个用于HA的RabbitMQ群集，已完全复制，停止了可能导致问题的思考，但是我离开了负载平衡器以防万一我决定重新创建群集）

RabbitMQ config: RabbitMQ配置：

hearbeat = 0 听音= 0
lazy_queue_explicit_gc_run_operation_threshold = 500 lazy_queue_explicit_gc_run_operation_threshold = 500
proxy-protocol = true proxy-protocol = true
vm_memory_high_watermark = 0.6 vm_memory_high_watermark = 0.6
vm_memory_high_watermark_paging_ratio = 0.1 vm_memory_high_watermark_paging_ratio = 0.1
queue_index_embed_msgs_below = 4096 queue_index_embed_msgs_below = 4096

Celery config: 芹菜配置：

CELERY_TASK_ACKS_LATE = false (tried both ways) CELERY_TASK_ACKS_LATE = false（双向尝试）
CELERY_RESULT_BACKEND = None CELERY_RESULT_BACKEND =无
CELERY_WORKER_ENABLE_REMOTE_CONTROL = True CELERY_WORKER_ENABLE_REMOTE_CONTROL =真
BROKER_HEARTBEAT = 0 BROKER_HEARTBEAT = 0
CELERY_CONTROL_QUEUE_EXPIRES = 60 CELERY_CONTROL_QUEUE_EXPIRES = 60
CELERY_BROKER_CONNECTION_TIMEOUT = 30 CELERY_BROKER_CONNECTION_TIMEOUT = 30
CELERY_WORKER_PREFETCH_MULTIPLIER = 1 CELERY_WORKER_PREFETCH_MULTIPLIER = 1
workers running with Ofair 与Ofair一起运行的工人
max-tasks-per-child = 10 (tried without it as well) 每个孩子的最大任务数= 10（也尝试不使用）

Tried using a higher prefetch, like 5, 10 and 20 and it did not work. 使用较高的预取功能（例如5、10和20）进行了尝试，但无法正常工作。

1 个解决方案

Managed to figure it out. 设法弄清楚了。 It was a networking issue. 这是一个网络问题。 The EC2 instance that I used for the load balancer had a low networking performance. 我用于负载平衡器的EC2实例的网络性能较低。 I picked up a new instance type with a better networking performance and it works amazingly fast. 我选择了具有更好网络性能的新实例类型，它的运行速度非常快。