简体   繁体   中英

Celery+Django — Atomic transaction with database-related task

In my current project using Django, Docker-Compose, and Celery (among other things), the basic upload file function, insertIntoDatabase , is called from a task, and in views.py the task is called with delay .

In databaseinserter.py:

def insertIntoDatabase(datapoints, user, description): # datapoints is a list of dictionaries, user and description are just strings
    # convert data and upload to our database

In tasks.py:

@app.task()
def db_ins_task(datapoints, user, description):
    from databaseinserter import insertIntoDatabase
    insertIntoDatabase(datapoints, user, description)

In views.py:

with transaction.atomic():
    db_ins_task.delay(datapoints, user, description)

Before Celery was introduced to the project, insertIntoDatabase was just called directly in views.py , so any invalid list of datapoints (ie improperly formatted) would not get inserted and the whole upload would be cancelled and rolled back. However, now that the uploading is in an asynchronous celery task, an invalid upload is no longer properly rolled back. How can I make sure that an invalid upload is still cancelled and undone entirely now that uploading is a task? It seems like Django 1.9 has something new that might be what I need: transaction.on_commit . However, the main problem with switching to 1.9 at the moment is that it doesn't seem like an important dependency in our project, Django-Hstore, is compatible. 1.9 is also in alpha, so it's not currently ideal to use even if the two were compatible. Is there a way to do this in Django 1.8?

I've also looked into django_transaction_barrier and have tried to use it but have had no luck. In tasks.py I changed the task to

@task(base=TransactionBarrierTask)
def db_ins_task(datapoints, user, description):
    from databaseinserter import insertIntoDatabase
    insertIntoDatabase(datapoints, user, description)

And in views.py I changed the task execution:

with transaction.atomic():
    db_ins_task.apply_async_with_barrier(args=(data, user, description,))

However, my main problem here is the fact that once the task is received, Celery throws an error about an unexpected keyword argument:

worker_1   | Traceback (most recent call last):
worker_1   |   File "/usr/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
worker_1   |     R = retval = fun(*args, **kwargs)
worker_1   |   File "/usr/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
worker_1   |     return self.run(*args, **kwargs)
worker_1   | TypeError: db_ins_task() got an unexpected keyword argument '__transaction_barrier'

So, what's the best way to go about this? Should I continue with trying to use django_transaction_barrier (if indeed I'm using it for the right thing)? If so, what am I doing wrong/missing that would lead to the error? If not, what's a better way to clear invalid uploads from my database?

Celery is an async task runner, basically once the task is handed off to celery, its fire and forget. You can not have a transaction across process boundaries as celery will be running as a worker.

You can always run another task to find invalid data points and clean up your database. In short you want a distributed transactions with two phase commit which isn't easily doable as it has it's own problems and not sure is available in Python.

Have you considered moving the transaction.atomic statement into the task? Or even into the insert function itself? Either of those should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM