简体   繁体   English

Asyncio.gather 与 asyncio.wait

[英]Asyncio.gather vs asyncio.wait

asyncio.gather and asyncio.wait seem to have similar uses: I have a bunch of async things that I want to execute/wait for (not necessarily waiting for one to finish before the next one starts). asyncio.gatherasyncio.wait似乎有相似的用途:我有一堆我想要执行/等待的异步事物(不一定要在下一个开始之前等待一个完成)。 They use a different syntax, and differ in some details, but it seems very un-pythonic to me to have 2 functions that have such a huge overlap in functionality.他们使用不同的语法,并且在一些细节上有所不同,但对我来说,拥有两个在功能上有如此巨大重叠的函数似乎非常不符合 pythonic。 What am I missing?我错过了什么?

Although similar in general cases ("run and get results for many tasks"), each function has some specific functionality for other cases:尽管在一般情况下类似(“运行并获取许多任务的结果”),但每个函数对于其他情况都有一些特定的功能:

asyncio.gather()

Returns a Future instance, allowing high level grouping of tasks:返回一个 Future 实例,允许对任务进行高级分组:

import asyncio
from pprint import pprint

import random


async def coro(tag):
    print(">", tag)
    await asyncio.sleep(random.uniform(1, 3))
    print("<", tag)
    return tag


loop = asyncio.get_event_loop()

group1 = asyncio.gather(*[coro("group 1.{}".format(i)) for i in range(1, 6)])
group2 = asyncio.gather(*[coro("group 2.{}".format(i)) for i in range(1, 4)])
group3 = asyncio.gather(*[coro("group 3.{}".format(i)) for i in range(1, 10)])

all_groups = asyncio.gather(group1, group2, group3)

results = loop.run_until_complete(all_groups)

loop.close()

pprint(results)

All tasks in a group can be cancelled by calling group2.cancel() or even all_groups.cancel() .组中的所有任务都可以通过调用group2.cancel()甚至all_groups.cancel()来取消。 See also .gather(..., return_exceptions=True) ,另见.gather(..., return_exceptions=True)

asyncio.wait()

Supports waiting to be stopped after the first task is done, or after a specified timeout, allowing lower level precision of operations:支持在第一个任务完成后或指定超时后等待停止,允许较低级别的操作精度:

import asyncio
import random


async def coro(tag):
    print(">", tag)
    await asyncio.sleep(random.uniform(0.5, 5))
    print("<", tag)
    return tag


loop = asyncio.get_event_loop()

tasks = [coro(i) for i in range(1, 11)]

print("Get first result:")
finished, unfinished = loop.run_until_complete(
    asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED))

for task in finished:
    print(task.result())
print("unfinished:", len(unfinished))

print("Get more results in 2 seconds:")
finished2, unfinished2 = loop.run_until_complete(
    asyncio.wait(unfinished, timeout=2))

for task in finished2:
    print(task.result())
print("unfinished2:", len(unfinished2))

print("Get all other results:")
finished3, unfinished3 = loop.run_until_complete(asyncio.wait(unfinished2))

for task in finished3:
    print(task.result())

loop.close()

asyncio.wait is more low level than asyncio.gather . asyncio.waitasyncio.gather更底层。

As the name suggests, asyncio.gather mainly focuses on gathering the results.顾名思义, asyncio.gather主要侧重于收集结果。 It waits on a bunch of futures and returns their results in a given order.它等待一堆期货并以给定的顺序返回它们的结果。

asyncio.wait just waits on the futures. asyncio.wait只是等待期货。 And instead of giving you the results directly, it gives done and pending tasks.而不是直接给你结果,它给出了已完成和待处理的任务。 You have to manually collect the values.您必须手动收集这些值。

Moreover, you could specify to wait for all futures to finish or just the first one with wait .此外,您可以使用 wait 指定等待所有期货完成或仅wait第一个期货完成。

A very important distinction, which is easy to miss, is the default behavior of these two functions, when it comes to exceptions.当涉及到异常时,很容易忽略的一个非常重要的区别是这两个函数的默认行为。


I'll use this example to simulate a coroutine that will raise exceptions, sometimes -我将使用这个例子来模拟一个会引发异常的协程,有时 -

import asyncio
import random


async def a_flaky_tsk(i):
    await asyncio.sleep(i)  # bit of fuzz to simulate a real-world example

    if i % 2 == 0:
        print(i, "ok")
    else:
        print(i, "crashed!")
        raise ValueError

coros = [a_flaky_tsk(i) for i in range(10)]

await asyncio.gather(*coros) outputs - await asyncio.gather(*coros)输出 -

0 ok
1 crashed!
Traceback (most recent call last):
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 20, in <module>
    asyncio.run(main())
  File "/Users/dev/.pyenv/versions/3.8.2/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/Users/dev/.pyenv/versions/3.8.2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 17, in main
    await asyncio.gather(*coros)
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 12, in a_flaky_tsk
    raise ValueError
ValueError

As you can see, the coros after index 1 never got to execute.如您所见,索引1之后的 coros 永远不会执行。


But await asyncio.wait(coros) continues to execute tasks, even if some of them fail -但是await asyncio.wait(coros)继续执行任务,即使其中一些失败 -

0 ok
1 crashed!
2 ok
3 crashed!
4 ok
5 crashed!
6 ok
7 crashed!
8 ok
9 crashed!
Task exception was never retrieved
future: <Task finished name='Task-10' coro=<a_flaky_tsk() done, defined at /Users/dev/PycharmProjects/trading/xxx.py:6> exception=ValueError()>
Traceback (most recent call last):
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 12, in a_flaky_tsk
    raise ValueError
ValueError
Task exception was never retrieved
future: <Task finished name='Task-8' coro=<a_flaky_tsk() done, defined at /Users/dev/PycharmProjects/trading/xxx.py:6> exception=ValueError()>
Traceback (most recent call last):
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 12, in a_flaky_tsk
    raise ValueError
ValueError
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<a_flaky_tsk() done, defined at /Users/dev/PycharmProjects/trading/xxx.py:6> exception=ValueError()>
Traceback (most recent call last):
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 12, in a_flaky_tsk
    raise ValueError
ValueError
Task exception was never retrieved
future: <Task finished name='Task-9' coro=<a_flaky_tsk() done, defined at /Users/dev/PycharmProjects/trading/xxx.py:6> exception=ValueError()>
Traceback (most recent call last):
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 12, in a_flaky_tsk
    raise ValueError
ValueError
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<a_flaky_tsk() done, defined at /Users/dev/PycharmProjects/trading/xxx.py:6> exception=ValueError()>
Traceback (most recent call last):
  File "/Users/dev/PycharmProjects/trading/xxx.py", line 12, in a_flaky_tsk
    raise ValueError
ValueError

Of course, this behavior can be changed for both by using -当然,可以通过使用 -

asyncio.gather(..., return_exceptions=True)

or,或者,

asyncio.wait([...], return_when=asyncio.FIRST_EXCEPTION)


But it doesn't end here!但这并没有结束!

Notice: Task exception was never retrieved in the logs above.注意:上面的日志Task exception was never retrieved

asyncio.wait() won't re-raise exceptions from the child tasks until you await them individually. asyncio.wait()不会从子任务中重新引发异常,直到您单独await它们。 (The stacktrace in the logs are just messages, they cannot be caught!) (日志中的堆栈跟踪只是消息,它们无法被捕获!)

done, pending = await asyncio.wait(coros)
for tsk in done:
    try:
        await tsk
    except Exception as e:
        print("I caught:", repr(e))

Output -输出 -

0 ok
1 crashed!
2 ok
3 crashed!
4 ok
5 crashed!
6 ok
7 crashed!
8 ok
9 crashed!
I caught: ValueError()
I caught: ValueError()
I caught: ValueError()
I caught: ValueError()
I caught: ValueError()

On the other hand, to catch exceptions with asyncio.gather() , you must -另一方面,要使用asyncio.gather()捕获异常,您必须 -

results = await asyncio.gather(*coros, return_exceptions=True)
for result_or_exc in results:
    if isinstance(result_or_exc, Exception):
        print("I caught:", repr(result_or_exc))

(Same output as before) (与之前的输出相同)

I also noticed that you can provide a group of coroutines in wait() by simply specifying the list:我还注意到,您可以通过简单地指定列表在 wait() 中提供一组协程:

result=loop.run_until_complete(asyncio.wait([
        say('first hello', 2),
        say('second hello', 1),
        say('third hello', 4)
    ]))

Whereas grouping in gather() is done by just specifying multiple coroutines:而在 gather() 中的分组仅通过指定多个协程来完成:

result=loop.run_until_complete(asyncio.gather(
        say('first hello', 2),
        say('second hello', 1),
        say('third hello', 4)
    ))

In addition to all the previous answers, I would like to tell about the different behavior of gather() and wait() in case they are cancelled .除了之前的所有答案之外,我还想介绍一下gather()wait()的不同行为,以防它们被取消

Gather() cancellation Gather()取消

If gather() is cancelled, all submitted awaitables (that have not completed yet) are also cancelled .如果gather()被取消,所有提交的等待(尚未完成)也被取消

Wait() cancellation 等待()取消

If the wait() ing task is cancelled, it simply throws an CancelledError and the waited tasks remain intact.如果wait()任务被取消,它只是抛出一个CancelledError并且等待的任务保持不变。

Simple example:简单的例子:

import asyncio


async def task(arg):
    await asyncio.sleep(5)
    return arg


async def cancel_waiting_task(work_task, waiting_task):
    await asyncio.sleep(2)
    waiting_task.cancel()
    try:
        await waiting_task
        print("Waiting done")
    except asyncio.CancelledError:
        print("Waiting task cancelled")

    try:
        res = await work_task
        print(f"Work result: {res}")
    except asyncio.CancelledError:
        print("Work task cancelled")


async def main():
    work_task = asyncio.create_task(task("done"))
    waiting = asyncio.create_task(asyncio.wait({work_task}))
    await cancel_waiting_task(work_task, waiting)

    work_task = asyncio.create_task(task("done"))
    waiting = asyncio.gather(work_task)
    await cancel_waiting_task(work_task, waiting)


asyncio.run(main())

Output:输出:

asyncio.wait()
Waiting task cancelled
Work result: done
----------------
asyncio.gather()
Waiting task cancelled
Work task cancelled

Applying申请

Sometimes it becomes necessary to combine wait() and gather() functionality.有时需要结合wait()gather()功能。 For example, we want to wait for the completion of at least one task and cancel the rest pending tasks after that, and if the waiting itself was canceled , then also cancel all pending tasks.例如,我们想等待至少一个任务完成,然后取消其余待处理的任务,如果waiting本身被取消,那么也取消所有待处理的任务。

As real examples, let's say we have a disconnect event and a work task.作为真实的例子,假设我们有一个断开连接事件和一个工作任务。 And we want to wait for the results of the work task, but if the connection was lost, then cancel it.而我们想要等待工作任务的结果,但是如果连接丢失了,那么就取消它。 Or we will make several parallel requests, but upon completion of at least one response, cancel all others.或者我们将发出多个并行请求,但在完成至少一个响应后,取消所有其他请求。

It could be done this way:可以这样做:

import asyncio
from typing import Optional, Tuple, Set


async def wait_any(
        tasks: Set[asyncio.Future], *, timeout: Optional[int] = None,
) -> Tuple[Set[asyncio.Future], Set[asyncio.Future]]:
    tasks_to_cancel: Set[asyncio.Future] = set()
    try:
        done, tasks_to_cancel = await asyncio.wait(
            tasks, timeout=timeout, return_when=asyncio.FIRST_COMPLETED
        )
        return done, tasks_to_cancel
    except asyncio.CancelledError:
        tasks_to_cancel = tasks
        raise
    finally:
        for task in tasks_to_cancel:
            task.cancel()


async def task():
    await asyncio.sleep(5)


async def cancel_waiting_task(work_task, waiting_task):
    await asyncio.sleep(2)
    waiting_task.cancel()
    try:
        await waiting_task
        print("Waiting done")
    except asyncio.CancelledError:
        print("Waiting task cancelled")

    try:
        res = await work_task
        print(f"Work result: {res}")
    except asyncio.CancelledError:
        print("Work task cancelled")


async def check_tasks(waiting_task, working_task, waiting_conn_lost_task):
    try:
        await waiting_task
        print("waiting is done")
    except asyncio.CancelledError:
        print("waiting is cancelled")

    try:
        await waiting_conn_lost_task
        print("connection is lost")
    except asyncio.CancelledError:
        print("waiting connection lost is cancelled")

    try:
        await working_task
        print("work is done")
    except asyncio.CancelledError:
        print("work is cancelled")


async def work_done_case():
    working_task = asyncio.create_task(task())
    connection_lost_event = asyncio.Event()
    waiting_conn_lost_task = asyncio.create_task(connection_lost_event.wait())
    waiting_task = asyncio.create_task(wait_any({working_task, waiting_conn_lost_task}))
    await check_tasks(waiting_task, working_task, waiting_conn_lost_task)


async def conn_lost_case():
    working_task = asyncio.create_task(task())
    connection_lost_event = asyncio.Event()
    waiting_conn_lost_task = asyncio.create_task(connection_lost_event.wait())
    waiting_task = asyncio.create_task(wait_any({working_task, waiting_conn_lost_task}))
    await asyncio.sleep(2)
    connection_lost_event.set()  # <---
    await check_tasks(waiting_task, working_task, waiting_conn_lost_task)


async def cancel_waiting_case():
    working_task = asyncio.create_task(task())
    connection_lost_event = asyncio.Event()
    waiting_conn_lost_task = asyncio.create_task(connection_lost_event.wait())
    waiting_task = asyncio.create_task(wait_any({working_task, waiting_conn_lost_task}))
    await asyncio.sleep(2)
    waiting_task.cancel()  # <---
    await check_tasks(waiting_task, working_task, waiting_conn_lost_task)


async def main():
    print("Work done")
    print("-------------------")
    await work_done_case()
    print("\nConnection lost")
    print("-------------------")
    await conn_lost_case()
    print("\nCancel waiting")
    print("-------------------")
    await cancel_waiting_case()


asyncio.run(main())

Output:输出:

Work done
-------------------
waiting is done
waiting connection lost is cancelled
work is done

Connection lost
-------------------
waiting is done
connection is lost
work is cancelled

Cancel waiting
-------------------
waiting is cancelled
waiting connection lost is cancelled
work is cancelled

You are correct that asyncio.gather() and asyncio.wait() have similar uses.你是正确的, asyncio.gather()asyncio.wait()有相似的用途。 Both functions are used to execute multiple coroutines concurrently.这两个函数都用于同时执行多个协程。 However, there are some differences between the two functions.但是,这两个函数之间存在一些差异。

asyncio.gather() is used to execute multiple coroutines concurrently and wait for them all to complete. asyncio.gather()用于同时执行多个协程并等待它们全部完成。 It returns the results of all the coroutines as a list in the order in which they were passed to the function. If any of the coroutines raise an exception, asyncio.gather() will raise a FirstException exception.它按照传递给 function 的顺序将所有协程的结果作为列表返回。如果任何协程引发异常, asyncio.gather()将引发 FirstException 异常。

asyncio.wait() is used to wait for one or more coroutines to complete. asyncio.wait()用于等待一个或多个协程完成。 It returns two sets of tasks: one set of tasks that have completed and another set of tasks that have not completed.它返回两组任务:一组已完成的任务和另一组尚未完成的任务。 You can use this function to wait for a specific condition in a collection of tasks, such as all complete, the first to complete, or the first to fail.您可以使用这个 function 来等待任务集合中的特定条件,例如全部完成、第一个完成或第一个失败。

So, while both functions can be used for similar purposes, they have different use cases.因此,虽然这两个功能可以用于相似的目的,但它们有不同的用例。 You can use asyncio.gather() when you want to execute multiple coroutines concurrently and wait for them all to complete.当你想同时执行多个协程并等待它们全部完成时,你可以使用asyncio.gather() You can use asyncio.wait() when you want to wait for one or more coroutines to complete.当你想等待一个或多个协程完成时,你可以使用asyncio.wait()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM