[英]Python: Slow generator streaming into fast consumer depletes buffer and terminates early
[英]async generator with slow consumer
如果我有一個以快速頻率發出值的異步生成器的緩慢消費者,並且我只關心使用最新的值(即我不關心丟棄值),有沒有辦法以雄辯的方式實現這一點? 我已經查看了aiostream ,但似乎找不到任何合適的東西。
這是一個簡單的例子:
import asyncio
import aiostream
async def main():
xs = aiostream.stream.count(interval=0.2)
async with xs.stream() as stream:
async for x in stream: # do something here to drop updates that aren't processed in time
print(x)
await asyncio.sleep(1.0)
if __name__ == "__main__":
asyncio.run(main())
我建議您使用一個處理外部生成器的類,因為我不知道有任何來源可以做到這一點。
該類可以在任務內部使用生成器,並且只保留最后一個值。 它就像是您真正想要使用的生成器的包裝器。
import asyncio
class RelaxedGenerator:
def __init__(self, async_gen):
self.last_value = None # the last value generated
self.consumed_last = True # flags the last value as consumed
self.async_gen = async_gen # generator which we can drop values
self.exhausted = False # flags the generator as fully consumed
@classmethod
async def start(cls, async_gen):
self = cls(async_gen())
asyncio.create_task(self.generate())
return self
async def generate(self):
# here you can consume the external async generator
# and save only the last value for further process
while True:
try:
self.last_value = await self.async_gen.__anext__()
self.consumed_last = False
except StopAsyncIteration:
self.exhausted = True
break
async def stream(self):
while not self.exhausted:
if self.consumed_last:
await asyncio.sleep(0.01) # avoids block the loop
continue
self.consumed_last = True
yield self.last_value
使用簡單的生成器進行測試:
import asyncio
from random import uniform
async def numbers_stream(max_=100):
next_int = -1
while next_int < max_:
next_int += 1
yield next_int
await asyncio.sleep(0.2)
async def main():
gen = await RelaxedGenerator.start(numbers_stream)
async for value in gen.stream():
print(value, end=", ", flush=True)
await asyncio.sleep(uniform(1, 2))
asyncio.run(main())
輸出:
0, 6, 15, 21, 28, 38, 43, 48, 57, 65, 73, 81, 89, 96,
其他要記住的事情是,如果您要處理最后一個值,或者您正在使用的生成器是否會在實踐中耗盡或不使用。 在這里,我假設您不關心最后一個值,並且生成器可能會耗盡。
您可以在生產者和消費者之間添加一個忘記舊結果的隊列。 不幸的是,標准庫中沒有它的實現,但它幾乎就在那里。 如果您檢查asyncio.Queue
的實現,您會注意到collections.deque
的使用,請參閱https://github.com/python/cpython/blob/3.10/Lib/asyncio/queues.py#L49 。 collections.deque
采用可選參數maxlen
來丟棄以前添加的項目,請參閱https://docs.python.org/3/library/collections.html#collections.deque 。 利用它,我們可以創建自定義隊列,它只保留最后 n 個項目。
import asyncio
import collections
class RollingQueue(asyncio.Queue):
def _init(self, maxsize):
self._queue = collections.deque(maxlen=maxsize)
def full(self):
return False
現在您可以按如下方式使用此隊列:
async def numbers(nmax):
for n in range(nmax):
yield n
await asyncio.sleep(0.3)
async def fill_queue(producer, queue):
async for item in producer:
queue.put_nowait(item)
queue.put_nowait(None)
queue1 = RollingQueue(1)
numgen = numbers(10)
task = fill_queue(numgen, queue1)
asyncio.create_task(task)
while True:
res = await queue1.get()
if res is None:
break
print(res)
await asyncio.sleep(1)
我將隊列大小設置為 1 以保留問題中要求的最后一項。
結合使用提供的兩個答案,我想出了以下似乎效果很好的解決方案:
import asyncio
import aiostream
import collections
class RollingQueue(asyncio.Queue):
def _init(self, maxsize):
self._queue = collections.deque(maxlen=maxsize)
def full(self):
return False
@aiostream.operator(pipable=True)
async def drop_stream(source, max_n=1):
queue = RollingQueue(max_n)
exhausted = False
async def inner_task():
async with aiostream.streamcontext(source) as streamer:
async for item in streamer:
queue.put_nowait(item)
nonlocal exhausted
exhausted = True
task = asyncio.create_task(inner_task())
try:
while not exhausted:
item = await queue.get()
yield item
finally:
task.cancel()
async def main():
xs = aiostream.stream.count(interval=0.2) | drop_stream.pipe(1) | aiostream.pipe.take(5)
async with xs.stream() as stream:
async for x in stream:
print(x)
await asyncio.sleep(1.0)
if __name__ == "__main__":
asyncio.run(main())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.