在超時中包裝asyncio.gather

Question

我看過asyncio.gather vs asyncio.wait ，但是不確定是否可以解決這個特定問題。 我想要做的是將asyncio.gather()協程包裝在asyncio.wait_for() ，並帶有一個timeout參數。 我還需要滿足以下條件：

return_exceptions=True （從asyncio.gather() -而不是傳播例外的是在等待任務gather()我想在結果中包括例外情況
順序：保留asyncio.gather()的屬性，即結果的順序與輸入的順序相同。 （否則，將輸出映射回輸入。）。 asyncio.wait_for()不符合此條件，我不確定實現此標准的理想方法。

超時適用於整個 asyncio.gather()列表中的整個 asyncio.gather() －如果它們陷入超時或返回異常，則這兩種情況中的任何一種都應僅將異常實例放置在結果列表中。

考慮以下設置：

>>> import asyncio
>>> import random
>>> from time import perf_counter
>>> from typing import Iterable
>>> from pprint import pprint
>>> 
>>> async def coro(i, threshold=0.4):
...     await asyncio.sleep(i)
...     if i > threshold:
...         # For illustration's sake - some coroutines may raise,
...         # and we want to accomodate that and just test for exception
...         # instances in the results of asyncio.gather(return_exceptions=True)
...         raise Exception("i too high")
...     return i
... 
>>> async def main(n, it: Iterable):
...     res = await asyncio.gather(
...         *(coro(i) for i in it),
...         return_exceptions=True
...     )
...     return res
... 
>>> 
>>> random.seed(444)
>>> n = 10
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> res = asyncio.run(main(n, it=it))
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")  # Expectation: ~1 seconds
Done main(10) in 0.86 seconds
>>> pprint(dict(zip(it, res)))
{0.01323751590501987: 0.01323751590501987,
 0.07422124156714727: 0.07422124156714727,
 0.3088946587429545: 0.3088946587429545,
 0.3113884366691503: 0.3113884366691503,
 0.4419557492849159: Exception('i too high'),
 0.4844375347808497: Exception('i too high'),
 0.5796792804615848: Exception('i too high'),
 0.6338658027451068: Exception('i too high'),
 0.7426396870165088: Exception('i too high'),
 0.8614799253779063: Exception('i too high')}

上面的程序（ n = 10 ）的預期運行時間為.5秒，異步運行時會產生一些開銷。 （ random.random()將在[0，1 random.random()中均勻分布。）

假設我要在整個操作中（例如，在協程main() ）將其強加為超時：

timeout = 0.5

現在，我可以使用asyncio.wait() ，但問題是結果是set對象，因此絕對不能保證asyncio.gather()的排序返回值屬性：

>>> async def main(n, it, timeout) -> tuple:
...     tasks = [asyncio.create_task(coro(i)) for i in it]
...     done, pending = await asyncio.wait(tasks, timeout=timeout)
...     return done, pending
... 
>>> timeout = 0.5
>>> random.seed(444)
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> done, pending = asyncio.run(main(n, it=it, timeout=timeout))
>>> for i in pending:
...     i.cancel()
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")
Done main(10) in 0.50 seconds
>>> done
{<Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>, <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>, <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>}
>>> pprint(done)
{<Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>}
>>> pprint(pending)
{<Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>}

如上所述，問題是我似乎無法將task實例映射回iterable的輸入。 它們的任務ID實際上在具有tasks = [asyncio.create_task(coro(i)) for i in it]的函數）范圍內丟失。 是否有Pythonic方式/使用asyncio API來模仿asyncio.gather()的行為？

Answer 1

看一下底層的_wait()協程，此協程將傳遞一個任務列表，並將修改這些任務的狀態。 這意味着，在的范圍內的main()時， tasks從tasks = [asyncio.create_task(coro(i)) for i in it]將由呼叫進行修改，以await asyncio.wait(tasks, timeout=timeout) 。 而不是返回的(done, pending)元組，一個解決方法是只返回tasks本身，它保留順序與輸入it 。 wait() / _wait()只是將任務分成完成/待處理的子集，在這種情況下，我們可以丟棄這些子集，並使用其元素已更改的tasks的整個列表。

在這種情況下，存在三種可能的任務狀態：

一個任務返回了一個有效的結果（ coro() ）沒有引發異常，它在timeout之下完成了。 它的.cancelled()將為False，並且具有有效的.result() ，它不是異常實例
任務因超時而受到打擊，之后才有機會返回結果或引發異常。 它將顯示.cancelled()及其.exception()將引發CancelledError
一個任務，它有時間完成，並從coro()引發了異常； 它將顯示.cancelled()為False，並且其exception()將引發

（所有這些都放在asyncio / futures.py中。）

插圖：

>>> # imports/other code snippets - see question
>>> async def main(n, it, timeout) -> tuple:
...     tasks = [asyncio.create_task(coro(i)) for i in it]
...     await asyncio.wait(tasks, timeout=timeout)
...     return tasks  # *not* (done, pending)

>>> timeout = 0.5
>>> random.seed(444)
>>> n = 10
>>> it = [random.random() for _ in range(n)]
>>> start = perf_counter()
>>> tasks = asyncio.run(main(n, it=it, timeout=timeout))
>>> elapsed = perf_counter() - start
>>> print(f"Done main({n}) in {elapsed:0.2f} seconds")
Done main(10) in 0.50 seconds

>>> pprint(tasks)
[<Task finished coro=<coro() done, defined at <stdin>:1> result=0.3088946587429545>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.01323751590501987>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>,
 <Task finished coro=<coro() done, defined at <stdin>:1> exception=Exception('i too high')>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.3113884366691503>,
 <Task finished coro=<coro() done, defined at <stdin>:1> result=0.07422124156714727>,
 <Task cancelled coro=<coro() done, defined at <stdin>:1>>]

現在應用上面的邏輯，讓res保留與輸入相對應的順序：

>>> res = []
>>> for t in tasks:
...     try:
...         r = t.result()
...     except Exception as e:
...         res.append(e)
...     else:
...         res.append(r)
>>> pprint(res)
[0.3088946587429545,
 0.01323751590501987,
 Exception('i too high'),
 CancelledError(),
 CancelledError(),
 CancelledError(),
 Exception('i too high'),
 0.3113884366691503,
 0.07422124156714727,
 CancelledError()]
>>> dict(zip(it, res))
{0.3088946587429545: 0.3088946587429545,
 0.01323751590501987: 0.01323751590501987,
 0.4844375347808497: Exception('i too high'),
 0.8614799253779063: concurrent.futures._base.CancelledError(),
 0.7426396870165088: concurrent.futures._base.CancelledError(),
 0.6338658027451068: concurrent.futures._base.CancelledError(),
 0.4419557492849159: Exception('i too high'),
 0.3113884366691503: 0.3113884366691503,
 0.07422124156714727: 0.07422124156714727,
 0.5796792804615848: concurrent.futures._base.CancelledError()}

在超時中包裝asyncio.gather

問題描述

1 個解決方案

解決方案1
0 已采納 2019-01-29 18:44:10

在超時中包裝asyncio.gather

問題描述

1 個解決方案

解決方案1 0 已采納 2019-01-29 18:44:10

解決方案1
0 已采納 2019-01-29 18:44:10