简体   繁体   中英

asyncio.gather(*tasks) fails to await only a subset of all tasks

Problem to solve (simplified)

Let's say I have 26 tasks to run in parallel. To minimize the load on server, I decide to run them 10 at time: first, 10 tasks in parallel, then the next 10, finally the remaining 6.

I wrote a simple script to achieve this behavior:

import asyncio
from string import ascii_uppercase
from typing import List

TASK_NAMES = ascii_uppercase  # 26 fake tasks in total


class BatchWorker:
    """Run a list of tasks in batch."""

    BATCH_SIZE = 10

    def __init__(self, tasks: List[asyncio.Task]):
        self._tasks = list(tasks)

    @property
    def batch_of_tasks(self):
        """Yield all tasks by chunks of `BATCH_SIZE`"""
        start = 0
        while 'there are items remaining in the list':
            end = start + self.BATCH_SIZE
            chunk = self._tasks[start:end]
            if not chunk:
                break
            yield chunk
            start = end

    async def run(self):
        print(f'Running {self.BATCH_SIZE} tasks at a time')
        for batch in self.batch_of_tasks:
            print(f'\nWaiting for {len(batch)} tasks to complete...')
            await asyncio.gather(*batch)
            print('\nSleeping...\n---')
            await asyncio.sleep(1)


async def task(name: str):
    print(f"Task '{name}' is running...")
    await asyncio.sleep(3)  # Pretend to do something


async def main():
    tasks = [
      asyncio.create_task(task(name))
      for name in TASK_NAMES
    ]
    worker = BatchWorker(tasks)
    await worker.run()


if __name__ == '__main__':
    asyncio.run(main())

What I would expect

I expected the logs to be as following:

Task A is running
[...]
Task J is running
Sleeping
---
Task K is running
[...]
Task T is running
Sleeping
---
[...]

... you get the point.

What I actually get

However, on the very first iteration, the worker wait for all 26 tasks to complete, despite the fact that I ask to gather only a batch of 10 of them . Check out the logs:

Running 10 tasks at a time

Waiting for 10 tasks to complete...
Task 'A' is running...
Task 'B' is running...
Task 'C' is running...
Task 'D' is running...
Task 'E' is running...
Task 'F' is running...
Task 'G' is running...
Task 'H' is running...
Task 'I' is running...
Task 'J' is running...
Task 'K' is running...
Task 'L' is running...
Task 'M' is running...
Task 'N' is running...
Task 'O' is running...
Task 'P' is running...
Task 'Q' is running...
Task 'R' is running...
Task 'S' is running...
Task 'T' is running...
Task 'U' is running...
Task 'V' is running...
Task 'W' is running...
Task 'X' is running...
Task 'Y' is running...
Task 'Z' is running...

Sleeping...
---

Waiting for 10 tasks to complete...

Sleeping...
---

Waiting for 6 tasks to complete...

Sleeping...
---

As you can see, there are 3 batchs in total (as expected) but only the first one does something. The remaining 2 have nothing to do.

My questions

  1. Given that the official docs states that .gather() will run only the awaitable provided as parameter concurrently, why is my script running all of my tasks instead of chunks of them?

  2. What else am I supposed to use to make it work as I'd like?

gather doesn't really "run" the awaitables, it just sleeps while the event loop does its thing, and wakes up once the awaitables it received are done. What your code does is:

  1. use asyncio.create_task() to spawn a bunch of awaitables in the background.
  2. use asyncio.gather() to wait in batches until some of them have finished.

The fact that gather() in #2 receives a subset of tasks created in #1 won't prevent the rest of the tasks created in #1 from happily running.

To fix the problem, you must postpone calling create_task() until the latest point. In fact, since gather() calls ensure_future() on its arguments (and ensure_future called with a coroutine object ends up calling create_task ), you needn't call create_task() at all. If you remove the call to create_task() from main and just pass the coroutine objects to the BatchWorker (and subsequently to gather ), the tasks will be both scheduled and awaited in batches, just as you want them:

async def main():
    tasks = [task(name) for name in TASK_NAMES]
    worker = BatchWorker(tasks)
    await worker.run()

I modified your code to behave the way I think you want it to work:

import asyncio
from string import ascii_uppercase
from typing import List

TASK_NAMES = ascii_uppercase  # 26 fake tasks in total


class BatchWorker:
    """Run a list of tasks in batch."""

    BATCH_SIZE = 10

    def __init__(self, tasks: List[asyncio.Task]):
        self._tasks = list(tasks)

    @property
    def batch_of_tasks(self):
        """Yield all tasks by chunks of `BATCH_SIZE`"""
        start = 0
        while 'there are items remaining in the list':
            end = start + self.BATCH_SIZE
            chunk = self._tasks[start:end]
            if not chunk:
                break
            yield chunk
            start = end

    async def run(self):
        print(f'Running {self.BATCH_SIZE} tasks at a time')
        for batch in self.batch_of_tasks:
            print(f'\nWaiting for {len(batch)} tasks to complete...')
            await asyncio.wait(batch)

async def task(name: str):
    print(f"Task '{name}' is running...")
    await asyncio.sleep(3)  # Pretend to do something


async def main():
    tasks = [
      task(name)
      for name in TASK_NAMES
    ]
    worker = BatchWorker(tasks)
    await worker.run()


if __name__ == '__main__':
    asyncio.run(main())

In the modified code, we've made a list of tasks (not scheduled), then we chuck them in the event loop with wait , which we then await for them to finish before allowing the for loop to continue. This way we batch up the the tasks into groups of at most ten, as you are trying to do.

Note: As you can see from reading the below comments, there is very little difference in this case between gather and wait, with my original explanation being incorrect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM