![](/img/trans.png)
[英]How to properly use asyncio run_coroutine_threadsafe function?
[英]Use asyncio coroutine to run functions in parallel?
我有以下代碼從數據庫( read_db
)讀取數據並將數據寫入鑲木地板文件( data.to_parquet
)。 兩個 I/O 操作都需要一段時間才能運行。
def main():
while id < 1000:
logging.info(f'reading - id: {id}')
data = read_db(id) # returns a dataframe
logging.info(f'saving - id: {id}')
data.to_parquet(f'{id}.parquet')
logging.info(f'saved - id: {id}')
id += 1
它很慢,所以我希望read_db(n+1)
和to_parquet(n)
同時運行。 我需要保持id
的每個步驟按順序完成( read_db(n+1)
需要在read_db(n)
之后運行, data.to_parquet(n+1)
在data.to_parquet(n)
之后運行。)。 這是異步版本
def async_wrap(f):
@wraps(f)
async def run(*args, loop=None, executor=None, **kwargs):
if loop is None:
loop = asyncio.get_event_loop()
p = partial(f, *args, **kwargs)
return await loop.run_in_executor(executor, p)
return run
async def main():
read_db_async = async_wrap(read_db)
while id < 1000:
logging.info(f'reading - id: {id}')
data = await read_db_async(id) # returns a dataframe
logging.info(f'saving - id: {id}')
to_parquet_async = async_wrap(data.to_parquet)
await data.to_parquet(f'{id}.parquet')
logging.info(f'saved - id: {id}')
id += 1
asyncio.get_event_loop().run_until_complete(main())
我除了看到一些亂序的日志:
reading - id: 1
saving - id: 1 (saving 1 and reading 2 run in parallel)
reading - id: 2
saved - id: 1
saving - id: 2
reading - id: 3
saved - id: 2
.....
但是,實際的日志和同步代碼是一樣的嗎?
reading - id: 1
saving - id: 1
saved - id: 1
reading - id: 2
saving - id: 2
saved - id: 2
reading - id: 3
.....
您可以使用 collect 或等效項使gather
read_db(n+1)
和to_parquet(n)
同時運行:
async def main():
read_db_async = async_wrap(read_db)
prev_to_parquet = asyncio.sleep(0) # no-op
for id in range(1, 1000):
data, _ = await asyncio.gather(read_db_async(id), prev_to_parquet)
to_parquet_async = async_wrap(data.to_parquet)
prev_to_parquet = to_parquet_async(f'{id}.parquet')
await prev_to_parquet
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.