简体   繁体   中英

How to add new item to asyncio queue as soon as a slot is available?

This code works like this:

List named 'ids' contains id numbers. By id numbers I download specific messages. 'nDownload' is the list index. The size value of the queue is equal to 5.

I pick up item from list,download message one at time and add it to queue. When nDownload is equal to 6:

  1. QueueFull exception occours.
  2. Create 5 workers.
  3. Workers extract metadata from message for other purposes.
  4. await queue.join() Blocks until all items in the queue have been gotten and processed.
  5. End -> delete workers.

The code works, I have no issue until now.

                nDownload = 0
                workers = []
                while (nDownload <= len(ids)):                        
                    try:
                        async for msg in get_messages(channel,ids=ids[nDownload]):
                           nDownload = nDownload + 1
                            try:   
                                queue.put_nowait(msg)
                            except (asyncio.QueueFull,IndexError) as qErr:
                                nDownload = nDownload - 1 
                                workers = [asyncio.create_task(worker(queue)) for _ in range(5)] 
                                await queue.join() 
                                for cancel in workers:
                                    cancel.cancel()                                    
                    except IndexError as iErr:
                        break    

Question: Sometimes message has different size. For example:

message 1 = 100MB downloaded in 8 minutes

message 2 = 1MB downloaded in 5 seconds

Once it has downloaded the shortest message (message 2), I get a free 'slot' in queue. Unfortunately, I have to wait for message 1 because queue.join()

How do add new item to queue at that time?

Why Do I use queue.join()? Because I dont know how to add max 5 messages to queue, wait for download it and resume I really need download group of messages and not all at once Thanks

EDIT: Yes, my worker is defined like this (simplified)

async def worker(queue):
while True:
    queue_msg = await queue.get()
    loop = asyncio.get_event_loop()
    try:
        task = loop.create_task(extract(queue_msg))
        await asyncio.wait_for(task, timeout=timeout)

    except errors.Fail:
  #Here I have to requeue the message when it fails,
  #so it requeues the ID in order to download the same msg later
     await queue.put(queue_msg.id)
    except asyncio.TimeoutError: 
     #requeue the msg etcc...

    finally:    
        queue.task_done()

Your answer is very smart, thanks However I choose queue 'size > 1' because I need to refetch the message when it fails the 'extract' task. ( sry I didn't tell you) I do not know what's going to happen if queue size = 1 and I try to add item. It's a bit hard this

It's not completely clear what your constraints are, but if I understand you correctly:

  • you want to download at most 5 things in parallel
  • you don't want to waste time - as soon as a worker is done with an item, it should obtain a new one

The queue size should be irrelevant for your purposes, it only serves as a buffer in case the workers are temporarily faster than get_messages . I'd even start with a queue size of 1 and experimented with whether larger values help with performance.

Spawning tasks on QueueFull seems strange and unnecessary. The idiomatic way to approach a producer-consumer pattern is to create a fix number of consumers and have them process multiple items as they arrive. You didn't show worker so it's not clear whether each worker processes just one message, or multiple ones.

I would rewrite the loop as:

queue = asyncio.Queue(1)
workers = [asyncio.create_task(worker(queue)) for _ in range(5)]
for current in ids:
    async for msg in get_messages(channel, id=current):
        # enqueue msg, waiting (if needed) for a free slot in the queue
        await queue.put(msg)
# wait for the remaining enqueued items to be processed
await queue.join()
# cancel the now-idle workers, which wait for a new message
# that will never arrive
for w in workers:
    w.cancel()

A worker would be defined like this:

async def worker(queue):
    while True:
        msg = await queue.get()
        ... process msg ...
        queue.task_done()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM