简体   繁体   中英

Large celery task memory leak

I have a huge celery task that works basically like this:

 @task
 def my_task(id):
   if settings.DEBUG:
     print "Don't run this with debug on."
     return False

   related_ids = get_related_ids(id)

   chunk_size = 500

   for i in xrange(0, len(related_ids), chunk_size):
     ids = related_ids[i:i+chunk_size]
     MyModel.objects.filter(pk__in=ids).delete()
     print_memory_usage()

I also have a manage.py command that just runs my_task(int(args[0])), so this can either be queued or run on the command line.

When run on the command line, print_memory_usage() reveals a relatively constant amount of memory used.

When run inside celery, print_memory_usage() reveals an ever-increasing amount of memory, continuing until the process is killed (I'm using Heroku with a 1GB memory limit, but other hosts would have a similar problem.) The memory leak appears to correspond with the chunk_size; if I increase the chunk_size, the memory consumption increases per-print. This seems to suggest that either celery is logging queries itself, or something else in my stack is.

Does celery log queries somewhere else?

Other notes:

  • DEBUG is off.
  • This happens both with RabbitMQ and Amazon's SQS as the queue.
  • This happens both locally and on Heroku (though it doesn't get killed locally due to having 16 GB of RAM.)
  • The task actually goes on to do more things than just deleting objects. Later it creates new objects via MyModel.objects.get_or_create(). This also exhibits the same behavior (memory grows under celery, doesn't grow under manage.py).

A bit of necroposting, but this can help people in the future. Although the best solution should be tracking the source of the problem, sometimes this is not possible either because the source of the problem is outside of our control. In this case you can use the --max-memory-per-child option when spawning the Celery worker process.

This turned out not to have anything to do with celery. Instead, it was new relic's logger that consumed all of that memory. Despite DEBUG being set to False, it was storing every SQL statement in memory in preparation for sending it to their logging server. I do not know if it still behaves this way, but it wouldn't flush that memory until the task fully completed.

The workaround was to use subtasks for each chunk of ids, to do the delete on a finite number of items.

The reason this wasn't a problem when running this as a management command is that new relic's logger wasn't integrated into the command framework.

Other solutions presented attempted to reduce the overhead for the chunking operation, which doesn't help in an O(N) scaling concern, or force the celery tasks to fail if a memory limit is exceeded (a feature that didn't exist at the time, but might have eventually worked with infinite retries.)

You can although run worker with --autoscale n,0 option. If minimum number of pool is 0 celery will kill unused workers and memory will be released.

But this is not good solution.

A lot of memory is used by django's Collector - before deleting it collects all related objects and firstly deletes them. You can set on_delete to SET_NULL on model fields.

Another possible solution is deleting objects with limits, for example some objects per hour. That will lower memory usage.

Django does not have raw_delete. You can use raw sql for this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM