简体   繁体   English

asyncio.gather的“惰性”版本?

[英]“Lazy” version of asyncio.gather?

I'm using Python's asyncio module and async / await to process a character sequence in chunks concurrently and collect the results in a list. 我正在使用Python的asyncio模块,并且async / await同时处理块中的字符序列并将结果收集在列表中。 For that I'm using a chunker function ( split ) and a chunk processing function ( process_chunk ). 为此,我使用了split块器功能( split )和split块处理功能( process_chunk )。 They both come from a third-party library, and I would prefer not to change them. 它们都来自第三方库,我不希望对其进行更改。

Chunking is slow, and the number of chunks is not known up front, which is why I don't want to consume the whole chunk generator at once. 分块很慢,并且块的数量事先未知,这就是为什么我不想立即消耗整个块生成器的原因。 Ideally, the code should advance the generator in sync with the process_chunk 's semaphore, ie, every time that function returns. 理想情况下,代码应与process_chunk的信号量(即,每次函数返回)同步地推进生成器。

My code 我的密码

import asyncio

def split(sequence):
    for x in sequence:
        print('Getting the next chunk:', x)
        yield x
    print('Finished chunking')

async def process_chunk(chunk, *, semaphore=asyncio.Semaphore(2)):
    async with semaphore:
        print('Processing chunk:', chunk)
        await asyncio.sleep(3)
        return 'OK'

async def process_in_chunks(sequence):
    gen = split(sequence)
    coro = [process_chunk(chunk) for chunk in gen]
    results = await asyncio.gather(*coro)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(process_in_chunks('ABC'))

kind of works and prints 作品和版画的种类

Getting the next chunk: A
Getting the next chunk: B
Getting the next chunk: C
Finished chunking
Processing chunk: C
Processing chunk: B
Processing chunk: A

although that means that the gen generator is exhausted before the processing begins. 虽然这意味着, gen的处理开始之前发生器被耗尽。 I know why it happens, but how do change that? 我知道为什么会发生,但是如何改变呢?

If you don't mind having an external dependency, you can use aiostream.stream.map : 如果您不介意具有外部依赖性,则可以使用aiostream.stream.map

from aiostream import stream, pipe

async def process_in_chunks(sequence):
    # Asynchronous sequence of chunks
    xs = stream.iterate(split(sequence))
    # Asynchronous sequence of results
    ys = xs | pipe.map(process_chunk, task_limit=2)
    # Aggregation of the results into a list
    zs = ys | pipe.list()
    # Run the stream
    results = await zs
    print(results)

The chunks are generated lazily and fed to the process_chunk coroutine. 这些块是惰性生成的,并被馈送到process_chunk协程。 The amount of coroutines running concurrently is controlled by task_limit . 并发运行的协程数量由task_limit控制。 That means the semaphore in process_chunk is no longer necessary. 这意味着process_chunk的信号量不再需要。

Output: 输出:

Getting the next chunk: A
Processing chunk: A
Getting the next chunk: B
Processing chunk: B
# Pause 3 seconds
Getting the next chunk: C
Processing chunk: C
Finished chunking
# Pause 3 seconds
['OK', 'OK', 'OK']

See more examples in this demonstration and the documentation . 请参阅本演示文档中的更多示例。

  • Use next to iterate through gen manually 使用next手动遍历gen
  • Acquire semaphore before getting and processing chunk 在获取和处理块之前获取信号量
  • Release semaphore after chuck been processed 处理卡盘后释放信号量

.

import asyncio


# third-party:
def split(sequence):
    for x in sequence:
        print('Getting the next chunk:', x)
        yield x
    print('Finished chunking')


async def process_chunk(chunk, *, semaphore=asyncio.Semaphore(2)):
    async with semaphore:
        print('Processing chunk:', chunk)
        await asyncio.sleep(3)
        return 'OK'


# our code:
sem = asyncio.Semaphore(2)  # let's use our semaphore


async def process_in_chunks(sequence):    
    tasks = []
    gen = split(sequence)
    while True:
        await sem.acquire()
        try:
            chunk = next(gen)
        except StopIteration:
            break
        else:
            task = asyncio.ensure_future(process_chunk(chunk))  # task to run concurently
            task.add_done_callback(lambda *_: sem.release())  # allow next chunks to be processed
            tasks.append(task)
    await asyncio.gather(*tasks, return_exceptions=True)  # await all pending task
    results = [task.result() for task in tasks]
    return results


if __name__ ==  '__main__':
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(process_in_chunks('ABCDE'))
    finally:
        loop.run_until_complete(loop.shutdown_asyncgens())
        loop.close()

Output: 输出:

Getting the next chunk: A
Getting the next chunk: B
Processing chunk: A
Processing chunk: B
Getting the next chunk: C
Getting the next chunk: D
Processing chunk: C
Processing chunk: D
Getting the next chunk: E
Finished chunking
Processing chunk: E

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM