大型芹菜任务内存泄漏

Question

I have a huge celery task that works basically like this:我有一个巨大的芹菜任务，基本上是这样工作的：

 @task
 def my_task(id):
   if settings.DEBUG:
     print "Don't run this with debug on."
     return False

   related_ids = get_related_ids(id)

   chunk_size = 500

   for i in xrange(0, len(related_ids), chunk_size):
     ids = related_ids[i:i+chunk_size]
     MyModel.objects.filter(pk__in=ids).delete()
     print_memory_usage()

I also have a manage.py command that just runs my_task(int(args[0])), so this can either be queued or run on the command line.我还有一个 manage.py 命令，它只运行 my_task(int(args[0]))，因此它可以排队或在命令行上运行。

When run on the command line, print_memory_usage() reveals a relatively constant amount of memory used.在命令行上运行时，print_memory_usage() 显示使用的内存量相对恒定。

When run inside celery, print_memory_usage() reveals an ever-increasing amount of memory, continuing until the process is killed (I'm using Heroku with a 1GB memory limit, but other hosts would have a similar problem.) The memory leak appears to correspond with the chunk_size;在 celery 中运行时，print_memory_usage() 显示内存量不断增加，直到进程被终止（我使用的 Heroku 内存限制为 1GB，但其他主机也会有类似的问题。）内存泄漏似乎是与 chunk_size 对应； if I increase the chunk_size, the memory consumption increases per-print.如果我增加 chunk_size，每次打印的内存消耗会增加。 This seems to suggest that either celery is logging queries itself, or something else in my stack is.这似乎表明 celery 正在记录查询本身，或者我的堆栈中的其他内容。

Does celery log queries somewhere else? celery 是否在其他地方记录查询？

Other notes:其他注意事项：

DEBUG is off.调试关闭。
This happens both with RabbitMQ and Amazon's SQS as the queue. RabbitMQ 和 Amazon 的 SQS 作为队列都会发生这种情况。
This happens both locally and on Heroku (though it doesn't get killed locally due to having 16 GB of RAM.)这在本地和 Heroku 上都会发生（尽管由于有 16 GB 的 RAM，它不会在本地被杀死。）
The task actually goes on to do more things than just deleting objects.该任务实际上会继续做更多的事情，而不仅仅是删除对象。 Later it creates new objects via MyModel.objects.get_or_create().稍后它通过 MyModel.objects.get_or_create() 创建新对象。 This also exhibits the same behavior (memory grows under celery, doesn't grow under manage.py).这也表现出相同的行为（内存在芹菜下增长，在 manage.py 下不增长）。

Answer 1

A bit of necroposting, but this can help people in the future.有点死尸，但这可以帮助人们在未来。 Although the best solution should be tracking the source of the problem, sometimes this is not possible either because the source of the problem is outside of our control.虽然最好的解决方案应该是追踪问题的根源，但有时这也是不可能的，因为问题的根源不在我们的控制范围内。 In this case you can use the --max-memory-per-child option when spawning the Celery worker process.在这种情况下，您可以在生成 Celery 工作进程时使用--max-memory-per-child选项。

Answer 2

This turned out not to have anything to do with celery.结果证明这与芹菜无关。 Instead, it was new relic's logger that consumed all of that memory.相反，是新遗物的记录器消耗了所有内存。 Despite DEBUG being set to False, it was storing every SQL statement in memory in preparation for sending it to their logging server.尽管 DEBUG 被设置为 False，它还是将每个 SQL 语句存储在内存中，以准备将其发送到他们的日志服务器。 I do not know if it still behaves this way, but it wouldn't flush that memory until the task fully completed.我不知道它是否仍然以这种方式运行，但是在任务完全完成之前它不会刷新该内存。

The workaround was to use subtasks for each chunk of ids, to do the delete on a finite number of items.解决方法是对每个 id 块使用子任务，对有限数量的项目进行删除。

The reason this wasn't a problem when running this as a management command is that new relic's logger wasn't integrated into the command framework.将此作为管理命令运行时这不是问题的原因是新遗物的记录器未集成到命令框架中。

Other solutions presented attempted to reduce the overhead for the chunking operation, which doesn't help in an O(N) scaling concern, or force the celery tasks to fail if a memory limit is exceeded (a feature that didn't exist at the time, but might have eventually worked with infinite retries.)提出的其他解决方案试图减少分块操作的开销，这对 O(N) 缩放问题没有帮助，或者如果超出内存限制则强制 celery 任务失败（该功能在当时不存在）时间，但最终可能会无限重试。）

Answer 3

尝试使用 @shared_task 装饰器

Answer 4

You can although run worker with --autoscale n,0 option.您可以使用--autoscale n,0选项运行 worker。 If minimum number of pool is 0 celery will kill unused workers and memory will be released.如果池的最小数量为 0 celery 将杀死未使用的工人并释放内存。

But this is not good solution.但这不是一个好的解决方案。

A lot of memory is used by django's Collector - before deleting it collects all related objects and firstly deletes them. django 的收集器使用了大量内存 - 在删除之前它收集所有相关对象并首先删除它们。 You can set on_delete to SET_NULL on model fields.您可以在模型字段上将 on_delete 设置为 SET_NULL。

Another possible solution is deleting objects with limits, for example some objects per hour.另一种可能的解决方案是删除有限制的对象，例如每小时删除一些对象。 That will lower memory usage.这将降低内存使用率。

Django does not have raw_delete. Django 没有 raw_delete。 You can use raw sql for this.您可以为此使用原始 sql。

大型芹菜任务内存泄漏

问题描述

4 个解决方案

解决方案1
1 2019-11-04 14:28:26

解决方案2
1 已采纳 2019-11-05 15:51:21

解决方案3
0 2014-02-13 09:54:32

解决方案4
0 2015-04-03 11:04:37

大型芹菜任务内存泄漏

问题描述

4 个解决方案

解决方案1 1 2019-11-04 14:28:26

解决方案2 1 已采纳 2019-11-05 15:51:21

解决方案3 0 2014-02-13 09:54:32

解决方案4 0 2015-04-03 11:04:37

解决方案1
1 2019-11-04 14:28:26

解决方案2
1 已采纳 2019-11-05 15:51:21

解决方案3
0 2014-02-13 09:54:32

解决方案4
0 2015-04-03 11:04:37