简体   繁体   中英

Asyncio in python: How can I wait for some files to finish, while running coroutines

I am creating a program that rsync's data from multiple remote stations (~200). Sometimes there are multiple files per station and groups of stations that should NOT be contacted at the same time or else the connection will close. Currently, I have an asynchronous routine (using asyncio in python) that aynchronously rsync's all of the files at once (to their respective stations). This results in connection closed if we get a backlog of files or are contacting stations that are in the same group at the same time.

What I need to do is to create grouped tasks, where in the station group it waits on the previous file to update (updateFile()) before starting the next file, but asynchronously runs all of the station groups at the same time.

New to asynchronous programming and I just cannot figure out how to make this problem work.

I've currently managed to run everything asynchronously.

Start Loop

loop = asyncio.get_event_loop()

Create grouped stations tasks and individual station tasks

tasks=[]
for group, files in file_groups.items():
   files_in_task = []
   for order, fqdn, file in files:
       if group == 'none':
           futures = [updateFile(file, fqdn)]
           tasks.append(asyncio.gather(*futures))
       else: # In Group
           x = (file, fqdn)
           files_in_task.append(x)
       futures = [updateFile(file,fqdn) for (file,fqdn) in files_in_task]
       tasks.append(asyncio.wait(*futures))

Run the event loop until all tasks have returned.

loop.run_until_complete(asyncio.wait(tasks))
loop.close()

If my understanding is correct, you want to implement groups that run in parallel, with each individual group running its elements in sequence.

A coroutine that updates a single group would consist of a simple loop using await to evaluate the individual elements sequentially:

async def update_group(files):
    # update a single group, by running updateFiles in series
    for _order, fqdn, filename in files:
        await updateFile(filename, fqdn)

Updating all groups requires running multiple instances of update_group() in parallel . In asyncio the unit of paralellism is atask , so we create one for each update_group() , and use asyncio.gather to wait for all the tasks to finish, allowing them to run in parallel:

async def update_all_groups(file_groups):
    # update all the different groups in parallel
    tasks = []
    for _group, files in file_groups.items():
        tasks.append(asyncio.create_task(update_group(files)))
    await asyncio.gather(*tasks)

Finally, the whole setup is invoked from sync code, ideally with a single call toasyncio.run , which sets up, runs, and closes the loop:

asyncio.run(update_all_groups(file_groups))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM