简体   繁体   English

python 中的异步:如何在运行协程时等待某些文件完成

[英]Asyncio in python: How can I wait for some files to finish, while running coroutines

I am creating a program that rsync's data from multiple remote stations (~200).我正在创建一个从多个远程站(~200)rsync 数据的程序。 Sometimes there are multiple files per station and groups of stations that should NOT be contacted at the same time or else the connection will close.有时每个站点和站点组有多个文件,不应同时联系,否则连接将关闭。 Currently, I have an asynchronous routine (using asyncio in python) that aynchronously rsync's all of the files at once (to their respective stations).目前,我有一个异步例程(在 python 中使用 asyncio),它一次异步地 rsync 的所有文件(到它们各自的站)。 This results in connection closed if we get a backlog of files or are contacting stations that are in the same group at the same time.如果我们收到文件积压或同时联系同一组中的工作站,这将导致连接关闭。

What I need to do is to create grouped tasks, where in the station group it waits on the previous file to update (updateFile()) before starting the next file, but asynchronously runs all of the station groups at the same time.我需要做的是创建分组任务,在站组中它在启动下一个文件之前等待上一个文件更新(updateFile()),但同时异步运行所有站组。

New to asynchronous programming and I just cannot figure out how to make this problem work.异步编程的新手,我只是不知道如何解决这个问题。

I've currently managed to run everything asynchronously.我目前已经设法异步运行所有内容。

Start Loop开始循环

loop = asyncio.get_event_loop()

Create grouped stations tasks and individual station tasks创建分组工作站任务和单个工作站任务

tasks=[]
for group, files in file_groups.items():
   files_in_task = []
   for order, fqdn, file in files:
       if group == 'none':
           futures = [updateFile(file, fqdn)]
           tasks.append(asyncio.gather(*futures))
       else: # In Group
           x = (file, fqdn)
           files_in_task.append(x)
       futures = [updateFile(file,fqdn) for (file,fqdn) in files_in_task]
       tasks.append(asyncio.wait(*futures))

Run the event loop until all tasks have returned.运行事件循环,直到所有任务都返回。

loop.run_until_complete(asyncio.wait(tasks))
loop.close()

If my understanding is correct, you want to implement groups that run in parallel, with each individual group running its elements in sequence.如果我的理解是正确的,您希望实现并行运行的组,每个单独的组按顺序运行其元素。

A coroutine that updates a single group would consist of a simple loop using await to evaluate the individual elements sequentially:更新单个组的协程将由一个简单的循环组成,该循环使用await按顺序评估各个元素:

async def update_group(files):
    # update a single group, by running updateFiles in series
    for _order, fqdn, filename in files:
        await updateFile(filename, fqdn)

Updating all groups requires running multiple instances of update_group() in parallel .更新所有组需要并行运行多个update_group()实例。 In asyncio the unit of paralellism is atask , so we create one for each update_group() , and use asyncio.gather to wait for all the tasks to finish, allowing them to run in parallel:在 asyncio 中,并行的单位是一个任务,因此我们为每个update_group()创建一个,并使用asyncio.gather等待所有任务完成,允许它们并行运行:

async def update_all_groups(file_groups):
    # update all the different groups in parallel
    tasks = []
    for _group, files in file_groups.items():
        tasks.append(asyncio.create_task(update_group(files)))
    await asyncio.gather(*tasks)

Finally, the whole setup is invoked from sync code, ideally with a single call toasyncio.run , which sets up, runs, and closes the loop:最后,从同步代码调用整个设置,理想情况下只需一次调用asyncio.run设置、运行和关闭循环:

asyncio.run(update_all_groups(file_groups))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM