在超时中包装asyncio.gather

Question

I've seen asyncio.gather vs asyncio.wait , but am not sure if that addresses this particular question. 我看过asyncio.gather vs asyncio.wait ，但是不确定是否可以解决这个特定问题。 What I'm looking to do is wrap the asyncio.gather() coroutine in asyncio.wait_for() , with a timeout argument. 我想要做的是将asyncio.gather()协程包装在asyncio.wait_for() ，并带有一个timeout参数。 I also need to satisfy these conditions: 我还需要满足以下条件：

return_exceptions=True (from asyncio.gather() ) - rather than propagating exceptions to the task that awaits on gather() , I want to include exceptions instances in the results return_exceptions=True （从asyncio.gather() -而不是传播例外的是在等待任务gather()我想在结果中包括例外情况
Order: retain the property of asyncio.gather() that the order of results is the same as the order of the input. 顺序：保留asyncio.gather()的属性，即结果的顺序与输入的顺序相同。 (Or otherwise, map the output back to the input.). （否则，将输出映射回输入。）。 asyncio.wait_for() fails this criteria and I'm not sure of ideal way to achieve it. asyncio.wait_for()不符合此条件，我不确定实现此标准的理想方法。

The timeout is for the entire asyncio.gather() across the list of awaitables--if they get caught in the timeout or return an exception, either of those cases should just place an exception instance in the result list. 超时适用于整个 asyncio.gather()列表中的整个 asyncio.gather() －如果它们陷入超时或返回异常，则这两种情况中的任何一种都应仅将异常实例放置在结果列表中。

Consider this setup: 考虑以下设置：

>>> import asyncio
>>> import random
>>> from time import perf_counter
>>> from typing import Iterable
>>> from pprint import pprint
>>> 
>>> async def coro(i, threshold=0.4):
...     await asyncio.sleep(i)
...     if i > threshold:
...         # For illustration's sake - some coroutines may raise,
...         # and we want to accomodate that and just test for exception
...         # instances in the results of asyncio.gather(return_exceptions=True)
...         raise Exception("i too high")
...     return i
... 
>>> async def main(n, it: Iterable):
...     res = await asyncio.gather(
...         *(coro(i) for i in it),
...         return_exceptions=True
...     )
...     return res
... 
>>> 
>>> random.seed(444)
>>> n = 10
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> res = asyncio.run(main(n, it=it))
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")  # Expectation: ~1 seconds
Done main(10) in 0.86 seconds
>>> pprint(dict(zip(it, res)))
{0.01323751590501987: 0.01323751590501987,
 0.07422124156714727: 0.07422124156714727,
 0.3088946587429545: 0.3088946587429545,
 0.3113884366691503: 0.3113884366691503,
 0.4419557492849159: Exception('i too high'),
 0.4844375347808497: Exception('i too high'),
 0.5796792804615848: Exception('i too high'),
 0.6338658027451068: Exception('i too high'),
 0.7426396870165088: Exception('i too high'),
 0.8614799253779063: Exception('i too high')}

The program above, with n = 10 , has an exected runtime of .5 seconds plus a bit of overhead when run asynchronously. 上面的程序（ n = 10 ）的预期运行时间为.5秒，异步运行时会产生一些开销。 ( random.random() will be uniformly distributed in [0, 1).) （ random.random()将在[0，1 random.random()中均匀分布。）

Let's say I want to impose that as the timeout, on the entire operation (ie on the coroutine main() ): 假设我要在整个操作中（例如，在协程main() ）将其强加为超时：

timeout = 0.5

Now, I can use asyncio.wait() , but the problem is that the results are set objects and so definitely can't guarantee the sorted return value property of asyncio.gather() : 现在，我可以使用asyncio.wait() ，但问题是结果是set对象，因此绝对不能保证asyncio.gather()的排序返回值属性：

>>> async def main(n, it, timeout) -> tuple:
...     tasks = [asyncio.create_task(coro(i)) for i in it]
...     done, pending = await asyncio.wait(tasks, timeout=timeout)
...     return done, pending
... 
>>> timeout = 0.5
>>> random.seed(444)
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> done, pending = asyncio.run(main(n, it=it, timeout=timeout))
>>> for i in pending:
...     i.cancel()
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")
Done main(10) in 0.50 seconds
>>> done
{<Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>, <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>}
>>> pprint(done)
{<Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>}
>>> pprint(pending)
{<Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>}

As stated above, the issue is that I seemingly can't map back task instances to the inputs in iterable . 如上所述，问题是我似乎无法将task实例映射回iterable的输入。 They task ids are effectively lost inside a function scope with tasks = [asyncio.create_task(coro(i)) for i in it] . 它们的任务ID实际上在具有tasks = [asyncio.create_task(coro(i)) for i in it]的函数）范围内丢失。 Is there a Pythonic way/use of asyncio API to mimic the behavior of asyncio.gather() here? 是否有Pythonic方式/使用asyncio API来模仿asyncio.gather()的行为？

Answer 1

Taking a look at the underlying _wait() coroutine, this coroutine gets passed a list of tasks and will modify the state of those tasks in place. 看一下底层的_wait()协程，此协程将传递一个任务列表，并将修改这些任务的状态。 This means that, within the scope of main() , the tasks from tasks = [asyncio.create_task(coro(i)) for i in it] will be modified by the call to await asyncio.wait(tasks, timeout=timeout) . 这意味着，在的范围内的main()时， tasks从tasks = [asyncio.create_task(coro(i)) for i in it]将由呼叫进行修改，以await asyncio.wait(tasks, timeout=timeout) 。 Instead of returning a (done, pending) tuple, one workaround is to just return tasks themselves, which retains order with the input it . 而不是返回的(done, pending)元组，一个解决方法是只返回tasks本身，它保留顺序与输入it 。 wait() / _wait() just separates the tasks into done/pending subsets and in this case we can discard those subsets and use the whole lists of tasks whose elements have been altered. wait() / _wait()只是将任务分成完成/待处理的子集，在这种情况下，我们可以丢弃这些子集，并使用其元素已更改的tasks的整个列表。

There are three possible tasks states in this case: 在这种情况下，存在三种可能的任务状态：

A task returned a valid result ( coro() ) didn't raise an exception, and it finished under the timeout . 一个任务返回了一个有效的结果（ coro() ）没有引发异常，它在timeout之下完成了。 Its .cancelled() will be False, and it has a valid .result() that is not an exception instance 它的.cancelled()将为False，并且具有有效的.result() ，它不是异常实例
A task got hit with the timeout before having a chance to return either a result or raise an exception. 任务因超时而受到打击，之后才有机会返回结果或引发异常。 It will show .cancelled() and its .exception() will raise a CancelledError 它将显示.cancelled()及其.exception()将引发CancelledError
A task that was allowed time to finished and raised an exception from coro() ; 一个任务，它有时间完成，并从coro()引发了异常； it will show .cancelled() as False and its exception() will raise 它将显示.cancelled()为False，并且其exception()将引发

(All of this is laid out in asyncio/futures.py .) （所有这些都放在asyncio / futures.py中。）

Illustration: 插图：

>>> # imports/other code snippets - see question
>>> async def main(n, it, timeout) -> tuple:
...     tasks = [asyncio.create_task(coro(i)) for i in it]
...     await asyncio.wait(tasks, timeout=timeout)
...     return tasks  # *not* (done, pending)

>>> timeout = 0.5
>>> random.seed(444)
>>> n = 10
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> tasks = asyncio.run(main(n, it=it, timeout=timeout))
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")
Done main(10) in 0.50 seconds

>>> pprint(tasks)
[<Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>]

Now to apply the logic from above, which lets res retain order corresponding to the inputs: 现在应用上面的逻辑，让res保留与输入相对应的顺序：

>>> res = []
>>> for t in tasks:
...     try:
...         r = t.result()
...     except Exception as e:
...         res.append(e)
...     else:
...         res.append(r)
>>> pprint(res)
[0.3088946587429545,
 0.01323751590501987,
 Exception('i too high'),
 CancelledError(),
 CancelledError(),
 CancelledError(),
 Exception('i too high'),
 0.3113884366691503,
 0.07422124156714727,
 CancelledError()]
>>> dict(zip(it, res))
{0.3088946587429545: 0.3088946587429545,
 0.01323751590501987: 0.01323751590501987,
 0.4844375347808497: Exception('i too high'),
 0.8614799253779063: concurrent.futures._base.CancelledError(),
 0.7426396870165088: concurrent.futures._base.CancelledError(),
 0.6338658027451068: concurrent.futures._base.CancelledError(),
 0.4419557492849159: Exception('i too high'),
 0.3113884366691503: 0.3113884366691503,
 0.07422124156714727: 0.07422124156714727,
 0.5796792804615848: concurrent.futures._base.CancelledError()}

在超时中包装asyncio.gather

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-29 18:44:10

在超时中包装asyncio.gather

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-29 18:44:10

解决方案1
0 已采纳 2019-01-29 18:44:10