简体   繁体   中英

Django related objects are missing from celery task (race condition?)

Strange behavior, that I don't know how to explain. I've got a model, Track , with some related points . I call a celery task to performs some calculations with points, and they seem to be perfectly reachable in the method itself, but unavailable in celery task.

@shared_task
def my_task(track):
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    my_task.delay(t)

That prints the following:

before the task, 21346, 2971
in the task, 21346, 0

Strange thing though, when I put a time.sleep(10) at the first line of my_task or before calling my_task at all, it works out well, like there's some race condition. But the first printed line clearly says that points are available in the database, when it makes a select query ( track.points.all().count() ).

I'm going to assume this is due to transaction isolation.

Django transactions by default are tied to requests; and while a transaction is active, no other process will see the changes until the transaction is committed. If you're in the middle of a save method, and there are quite a lot of other actions that take place before the request finishes, it seems likely that Celery starts processing the task before the transaction is committed. You could fix this by committing manually or by delaying the task.

You should NEVER pass model objects to celery tasks. This is because the session might expire (or be different) in the celery task compared to your Django application and this object will not be linked to the session and thus may not be available/beheave badly. What you should do is send the id. So something like track_id and then get the object from the database by issuing a query. That should most likely solve your problem.

@shared_task
def my_task(track_id):
    track = Track.query.get(track_id)  # Or how ever the query should be
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    my_task.delay(t.id)  # Pass the id here, not the object

So, I've solved it using django-transaction-hooks . It still looks kinda scary to replace my DB backend, but django-celery-transactions seems to be broken in Django 1.6. Now my setup looks like this:

settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'transaction_hooks.backends.postgresql_psycopg2',
        'NAME': 'foo',
        },
    }
SOUTH_DATABASE_ADAPTERS = {'default':'south.db.postgresql_psycopg2'}  # this is required, or South breaks

models.py:

from django.db import connection

@shared_task
def my_task(track):
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    connection.on_commit(lambda: my_task.delay(t))

Results:

before the task, 21346, 2971
in the task, 21346, 2971

It still seems strange that such a common use case has no native celery or Django solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM