使用 asyncio 协程并行运行函数？

Question

I have the following code which read data from database ( read_db ) and write the data to parquet file ( data.to_parquet ).我有以下代码从数据库（ read_db ）读取数据并将数据写入镶木地板文件（ data.to_parquet ）。 Both I/O operations take a while to run.两个 I/O 操作都需要一段时间才能运行。

def main():
    while id < 1000:
       logging.info(f'reading - id: {id}')
       data = read_db(id) # returns a dataframe

       logging.info(f'saving - id: {id}')
       data.to_parquet(f'{id}.parquet')
       logging.info(f'saved - id: {id}')

       id += 1

It's slow so I want read_db(n+1) and to_parquet(n) running concurrently.它很慢，所以我希望read_db(n+1)和to_parquet(n)同时运行。 I need to keep each step of id finishing sequentially though ( read_db(n+1) need to run after read_db(n) and data.to_parquet(n+1) run after data.to_parquet(n) .).我需要保持id的每个步骤按顺序完成（ read_db(n+1)需要在read_db(n)之后运行， data.to_parquet(n+1)在data.to_parquet(n)之后运行。）。 Here is the asynchronous version这是异步版本

def async_wrap(f):
    @wraps(f)
    async def run(*args, loop=None, executor=None, **kwargs):
        if loop is None:
            loop = asyncio.get_event_loop()
        p = partial(f, *args, **kwargs)
        return await loop.run_in_executor(executor, p)
    return run

async def main():
    read_db_async = async_wrap(read_db)
    while id < 1000:
       logging.info(f'reading - id: {id}')
       data = await read_db_async(id) # returns a dataframe

       logging.info(f'saving - id: {id}')
       to_parquet_async = async_wrap(data.to_parquet)
       await data.to_parquet(f'{id}.parquet')
       logging.info(f'saved - id: {id}')

       id += 1

asyncio.get_event_loop().run_until_complete(main())

I excepted to see the some out of order of logs:我除了看到一些乱序的日志：

reading - id: 1
saving - id: 1      (saving 1 and reading 2 run in parallel)
reading - id: 2
saved - id: 1
saving - id: 2
reading - id: 3
saved - id: 2
.....

But, the actually logs are the same of synchronous code?但是，实际的日志和同步代码是一样的吗？

reading - id: 1
saving - id: 1
saved - id: 1
reading - id: 2
saving - id: 2
saved - id: 2
reading - id: 3
.....

Answer 1

You could make read_db(n+1) and to_parquet(n) run concurrently by using gather or equivalent:您可以使用 collect 或等效项使gather read_db(n+1)和to_parquet(n)同时运行：

async def main():
    read_db_async = async_wrap(read_db)
    prev_to_parquet = asyncio.sleep(0)  # no-op

    for id in range(1, 1000):
        data, _ = await asyncio.gather(read_db_async(id), prev_to_parquet)
        to_parquet_async = async_wrap(data.to_parquet)
        prev_to_parquet = to_parquet_async(f'{id}.parquet')

    await prev_to_parquet

使用 asyncio 协程并行运行函数？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-01-27 09:31:10

使用 asyncio 协程并行运行函数？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-01-27 09:31:10

解决方案1
2 已采纳 2021-01-27 09:31:10