简体   繁体   中英

Celery redis backend not always returning result

I'm running a celery worker such that:

 -------------- celery@ v3.1.23 (Cipater)
---- **** ----- 
--- * ***  * -- Linux-4.4.0-31-generic-x86_64-with-debian-stretch-sid
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         __main__:0x7fe76cd42400
- ** ---------- .> transport:   amqp://
- ** ---------- .> results:     redis://
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- 
--- ***** ----- [queues]
 -------------- .> celery           exchange=celery(direct) key=celery
[tasks]
  . tasks.mytask

tasks.py :

@celery_app.task(bind=True, ignore_result=False)
def mytask(task):
    r = redis.StrictRedis()
    r.rpush('/task_finished', task.request.id)
    return {'result': 42}

When I try to run the following code, and run 2 task one after the other it works when getting the first result but fails to return the second one.

import celery.result
import redis

r = redis.StrictRedis()
celery_app = Celery(name="my_long_task", backend="redis://")

while True:
    _, resp = r.blpop('/task_finished')
    task_id = resp.decode('utf-8')
    task = celery.result.AsyncResult(task_id, app=celery_app)
    print(task)
    print(task.result)

Will return:

First loop :

[1] 990e2d04-5664-4d7c-8a5c-e9cb4ef45e24  
[2] {'result': 42}

Second loop (fails to return the result):

[3] 8463cc46-0884-4bf7-b838-f0614f74b271  
[4] {}

However if I instantiate celery_app = Celery(name="my_long_task", backend="redis://") in the while loop it will work each time.
What is wrong with not reinstantiating celery_app ? What am I missing ?

Edit:

Waiting a bit for the result (in case of latency) won't work too

while True:
    _, resp = r.blpop('/task_finished')
    task_id = resp.decode('utf-8')
    for i in range(0, 20):
        # Won't work because I need to re instantiate celery_app
        task = celery.result.AsyncResult(task_id, app=celery_app)
        print(task.result)
        time.sleep(1)

You have a race condition. This is what happens:

  1. The loop arrives at _, resp = r.blpop('/task_finished') and blocks there.

  2. The task executes r.rpush('/task_finished', task.request.id)

  3. The loop unblocks, executes task = celery.result.AsyncResult(task_id, app=celery_app) and gets an empty result because the task has not yet recorded its result to the database.

There may be a way do the r.rpush after celery has committed the results to the backend. Perhaps creating a custom class derived from Task would do it. But that's not something I've tried.

However, you could certainly modify your code to store the results together with the task id. Something like:

r.rpush('/task_finished', json.dumps({ "task_id": task.request.id, "result": 42 }))

I've used a JSON serialization for the sake of illustration. You can use whatever scheme you want. On reading:

_, resp = r.blpop('/task_finished')
resp = json.loads(resp)

With this, you might want to change ignore_result=False to ignore_result=True .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM