This code works like this:
List named 'ids' contains id numbers. By id numbers I download specific messages. 'nDownload' is the list index. The size value of the queue is equal to 5.
I pick up item from list,download message one at time and add it to queue. When nDownload is equal to 6:
The code works, I have no issue until now.
nDownload = 0
workers = []
while (nDownload <= len(ids)):
try:
async for msg in get_messages(channel,ids=ids[nDownload]):
nDownload = nDownload + 1
try:
queue.put_nowait(msg)
except (asyncio.QueueFull,IndexError) as qErr:
nDownload = nDownload - 1
workers = [asyncio.create_task(worker(queue)) for _ in range(5)]
await queue.join()
for cancel in workers:
cancel.cancel()
except IndexError as iErr:
break
Question: Sometimes message has different size. For example:
message 1 = 100MB downloaded in 8 minutes
message 2 = 1MB downloaded in 5 seconds
Once it has downloaded the shortest message (message 2), I get a free 'slot' in queue. Unfortunately, I have to wait for message 1 because queue.join()
How do add new item to queue at that time?
Why Do I use queue.join()? Because I dont know how to add max 5 messages to queue, wait for download it and resume I really need download group of messages and not all at once Thanks
EDIT: Yes, my worker is defined like this (simplified)
async def worker(queue):
while True:
queue_msg = await queue.get()
loop = asyncio.get_event_loop()
try:
task = loop.create_task(extract(queue_msg))
await asyncio.wait_for(task, timeout=timeout)
except errors.Fail:
#Here I have to requeue the message when it fails,
#so it requeues the ID in order to download the same msg later
await queue.put(queue_msg.id)
except asyncio.TimeoutError:
#requeue the msg etcc...
finally:
queue.task_done()
Your answer is very smart, thanks However I choose queue 'size > 1' because I need to refetch the message when it fails the 'extract' task. ( sry I didn't tell you) I do not know what's going to happen if queue size = 1 and I try to add item. It's a bit hard this
It's not completely clear what your constraints are, but if I understand you correctly:
The queue size should be irrelevant for your purposes, it only serves as a buffer in case the workers are temporarily faster than get_messages
. I'd even start with a queue size of 1 and experimented with whether larger values help with performance.
Spawning tasks on QueueFull
seems strange and unnecessary. The idiomatic way to approach a producer-consumer pattern is to create a fix number of consumers and have them process multiple items as they arrive. You didn't show worker
so it's not clear whether each worker processes just one message, or multiple ones.
I would rewrite the loop as:
queue = asyncio.Queue(1)
workers = [asyncio.create_task(worker(queue)) for _ in range(5)]
for current in ids:
async for msg in get_messages(channel, id=current):
# enqueue msg, waiting (if needed) for a free slot in the queue
await queue.put(msg)
# wait for the remaining enqueued items to be processed
await queue.join()
# cancel the now-idle workers, which wait for a new message
# that will never arrive
for w in workers:
w.cancel()
A worker would be defined like this:
async def worker(queue):
while True:
msg = await queue.get()
... process msg ...
queue.task_done()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.