简体   繁体   English

在超时中包装asyncio.gather

[英]Wrapping asyncio.gather in a timeout

I've seen asyncio.gather vs asyncio.wait , but am not sure if that addresses this particular question. 我看过asyncio.gather vs asyncio.wait ,但是不确定是否可以解决这个特定问题。 What I'm looking to do is wrap the asyncio.gather() coroutine in asyncio.wait_for() , with a timeout argument. 我想要做的是将asyncio.gather()协程包装在asyncio.wait_for() ,并带有一个timeout参数。 I also need to satisfy these conditions: 我还需要满足以下条件:

  • return_exceptions=True (from asyncio.gather() ) - rather than propagating exceptions to the task that awaits on gather() , I want to include exceptions instances in the results return_exceptions=True (从asyncio.gather() -而不是传播例外的是在等待任务gather()我想在结果中包括例外情况
  • Order: retain the property of asyncio.gather() that the order of results is the same as the order of the input. 顺序:保留asyncio.gather()的属性,即结果的顺序与输入的顺序相同。 (Or otherwise, map the output back to the input.). (否则,将输出映射回输入。)。 asyncio.wait_for() fails this criteria and I'm not sure of ideal way to achieve it. asyncio.wait_for()不符合此条件,我不确定实现此标准的理想方法。

The timeout is for the entire asyncio.gather() across the list of awaitables--if they get caught in the timeout or return an exception, either of those cases should just place an exception instance in the result list. 超时适用于整个 asyncio.gather()列表中的整个 asyncio.gather() -如果它们陷入超时或返回异常,则这两种情况中的任何一种都应仅将异常实例放置在结果列表中。

Consider this setup: 考虑以下设置:

>>> import asyncio
>>> import random
>>> from time import perf_counter
>>> from typing import Iterable
>>> from pprint import pprint
>>> 
>>> async def coro(i, threshold=0.4):
...     await asyncio.sleep(i)
...     if i > threshold:
...         # For illustration's sake - some coroutines may raise,
...         # and we want to accomodate that and just test for exception
...         # instances in the results of asyncio.gather(return_exceptions=True)
...         raise Exception("i too high")
...     return i
... 
>>> async def main(n, it: Iterable):
...     res = await asyncio.gather(
...         *(coro(i) for i in it),
...         return_exceptions=True
...     )
...     return res
... 
>>> 
>>> random.seed(444)
>>> n = 10
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> res = asyncio.run(main(n, it=it))
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")  # Expectation: ~1 seconds
Done main(10) in 0.86 seconds
>>> pprint(dict(zip(it, res)))
{0.01323751590501987: 0.01323751590501987,
 0.07422124156714727: 0.07422124156714727,
 0.3088946587429545: 0.3088946587429545,
 0.3113884366691503: 0.3113884366691503,
 0.4419557492849159: Exception('i too high'),
 0.4844375347808497: Exception('i too high'),
 0.5796792804615848: Exception('i too high'),
 0.6338658027451068: Exception('i too high'),
 0.7426396870165088: Exception('i too high'),
 0.8614799253779063: Exception('i too high')}

The program above, with n = 10 , has an exected runtime of .5 seconds plus a bit of overhead when run asynchronously. 上面的程序( n = 10 )的预期运行时间为.5秒,异步运行时会产生一些开销。 ( random.random() will be uniformly distributed in [0, 1).) random.random()将在[0,1 random.random()中均匀分布。)

Let's say I want to impose that as the timeout, on the entire operation (ie on the coroutine main() ): 假设我要在整个操作中(例如,在协程main() )将其强加为超时:

timeout = 0.5

Now, I can use asyncio.wait() , but the problem is that the results are set objects and so definitely can't guarantee the sorted return value property of asyncio.gather() : 现在,我可以使用asyncio.wait() ,但问题是结果是set对象,因此绝对不能保证asyncio.gather()的排序返回值属性:

>>> async def main(n, it, timeout) -> tuple:
...     tasks = [asyncio.create_task(coro(i)) for i in it]
...     done, pending = await asyncio.wait(tasks, timeout=timeout)
...     return done, pending
... 
>>> timeout = 0.5
>>> random.seed(444)
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> done, pending = asyncio.run(main(n, it=it, timeout=timeout))
>>> for i in pending:
...     i.cancel()
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")
Done main(10) in 0.50 seconds
>>> done
{<Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>, <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>}
>>> pprint(done)
{<Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>}
>>> pprint(pending)
{<Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>}

As stated above, the issue is that I seemingly can't map back task instances to the inputs in iterable . 如上所述,问题是我似乎无法将task实例映射回iterable的输入。 They task ids are effectively lost inside a function scope with tasks = [asyncio.create_task(coro(i)) for i in it] . 它们的任务ID实际上在具有tasks = [asyncio.create_task(coro(i)) for i in it]的函数)范围内丢失。 Is there a Pythonic way/use of asyncio API to mimic the behavior of asyncio.gather() here? 是否有Pythonic方式/使用asyncio API来模仿asyncio.gather()的行为?

Taking a look at the underlying _wait() coroutine, this coroutine gets passed a list of tasks and will modify the state of those tasks in place. 看一下底层的_wait()协程,此协程将传递一个任务列表,并将修改这些任务的状态。 This means that, within the scope of main() , the tasks from tasks = [asyncio.create_task(coro(i)) for i in it] will be modified by the call to await asyncio.wait(tasks, timeout=timeout) . 这意味着,在的范围内的main()时, taskstasks = [asyncio.create_task(coro(i)) for i in it]将由呼叫进行修改,以await asyncio.wait(tasks, timeout=timeout) Instead of returning a (done, pending) tuple, one workaround is to just return tasks themselves, which retains order with the input it . 而不是返回的(done, pending)元组,一个解决方法是只返回tasks本身,它保留顺序与输入it wait() / _wait() just separates the tasks into done/pending subsets and in this case we can discard those subsets and use the whole lists of tasks whose elements have been altered. wait() / _wait()只是将任务分成完成/待处理的子集,在这种情况下,我们可以丢弃这些子集,并使用其元素已更改的tasks的整个列表。

There are three possible tasks states in this case: 在这种情况下,存在三种可能的任务状态:

  • A task returned a valid result ( coro() ) didn't raise an exception, and it finished under the timeout . 一个任务返回了一个有效的结果( coro() )没有引发异常,它在timeout之下完成了。 Its .cancelled() will be False, and it has a valid .result() that is not an exception instance 它的.cancelled()将为False,并且具有有效的.result() ,它不是异常实例
  • A task got hit with the timeout before having a chance to return either a result or raise an exception. 任务因超时而受到打击,之后才有机会返回结果或引发异常。 It will show .cancelled() and its .exception() will raise a CancelledError 它将显示.cancelled()及其.exception()将引发CancelledError
  • A task that was allowed time to finished and raised an exception from coro() ; 一个任务,它有时间完成,并从coro()引发了异常; it will show .cancelled() as False and its exception() will raise 它将显示.cancelled()为False,并且其exception()将引发

(All of this is laid out in asyncio/futures.py .) (所有这些都放在asyncio / futures.py中 。)


Illustration: 插图:

>>> # imports/other code snippets - see question
>>> async def main(n, it, timeout) -> tuple:
...     tasks = [asyncio.create_task(coro(i)) for i in it]
...     await asyncio.wait(tasks, timeout=timeout)
...     return tasks  # *not* (done, pending)

>>> timeout = 0.5
>>> random.seed(444)
>>> n = 10
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> tasks = asyncio.run(main(n, it=it, timeout=timeout))
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")
Done main(10) in 0.50 seconds

>>> pprint(tasks)
[<Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>]

Now to apply the logic from above, which lets res retain order corresponding to the inputs: 现在应用上面的逻辑,让res保留与输入相对应的顺序:

>>> res = []
>>> for t in tasks:
...     try:
...         r = t.result()
...     except Exception as e:
...         res.append(e)
...     else:
...         res.append(r)
>>> pprint(res)
[0.3088946587429545,
 0.01323751590501987,
 Exception('i too high'),
 CancelledError(),
 CancelledError(),
 CancelledError(),
 Exception('i too high'),
 0.3113884366691503,
 0.07422124156714727,
 CancelledError()]
>>> dict(zip(it, res))
{0.3088946587429545: 0.3088946587429545,
 0.01323751590501987: 0.01323751590501987,
 0.4844375347808497: Exception('i too high'),
 0.8614799253779063: concurrent.futures._base.CancelledError(),
 0.7426396870165088: concurrent.futures._base.CancelledError(),
 0.6338658027451068: concurrent.futures._base.CancelledError(),
 0.4419557492849159: Exception('i too high'),
 0.3113884366691503: 0.3113884366691503,
 0.07422124156714727: 0.07422124156714727,
 0.5796792804615848: concurrent.futures._base.CancelledError()}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM