简体   繁体   中英

Queuing large number of celery tasks

I'm writing a python3 app using Celery distributed task scheduling library. Workers are using greenlet threads for processing. The task is I/O related with network operations.

I need to insert large number of celery tasks as single group. In this case it's around 10000 (10k) urls at once, each as seperate celery task.

Such insertion as single group, with redis or rabbitmq running on localhost is taking almost 12 seconds. Which is way too long.

Q: Is there any way to optimize this bulk insertion using celery?

In other threads i found people adivising use of chunks, however when i submit it in the chunks - the single chunk is being processed in single thread (not utilizing greenlets, which are necessary because of blocking IO on worker operations). That results in performance degradation. Consider following numbers:

  1. No-chunks: Insertion 12 second, processing 9 seconds.
  2. With chunks: Insertion 3 seconds, processing 27 seconds.

So using chunks is impossible as the blocking network operations are going to kill the performance benefit of greenlet threads.

soa = open('input.txt').readlines()
for line in soa:
    line = line.strip()
    s = line.split(':')
    l.append(check.s(s[0], s[1]))
    #l.append(s)
t = time.time()

res = check.chunks(l, 10)()
#print(res.get())
print("Submission taken %f" % (time.time() - t))

exit()

Chunks Result: Submission taken 2.251796 seconds

l = []

soa = open('input.txt').readlines()

for line in soa:
    line = line.strip()
    s = line.split(':')
    l.append(s)

job = group(l)
t = time.time()
result = job.apply_async()
print("Submission taken %f" % (time.time() - t))

Regular Result: Submission taken 12.54412 seconds

Celery literally has a task wrapper called Group and Chunk.

https://docs.celeryproject.org/en/latest/userguide/canvas.html

Chunk requires a results backend, I think, but just splitting your task into groups of say 50 to 200 URLs should allow Celery to optimize for you.

However if you are executing 10000 network-bound tasks, then it's going to take a hot second.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM