简体   繁体   中英

How to retry the task when aiohttp.ClientSession fails in async

I am struggling to understand this behavior as I am new to async functions in Python.

I am trying to create this simple download tool and I have this function

async def download_all_pages(sites):
    print('Running download all pages')
    try:
        async with aiohttp.ClientSession() as session:
            tasks = [asyncio.ensure_future(safe_download_page(session,url)) for url in sites]
            await asyncio.gather(*tasks, return_exceptions = True)
            try:
                await asyncio.sleep(0.25)
            except asyncio.CancelledError:
                print("Got CancelledError")
    except (aiohttp.ServerDisconnectedError, aiohttp.ClientResponseError,aiohttp.ClientConnectorError) as s:
        print("Oops, the server connection was dropped before we finished.")
        print(s)

I init this function like below:

try:
    loop.run_until_complete(download_all_pages([url+'/'+str(i) for i in range(1, nb_pages+1)]))
    loop.run_until_complete(download_all_sites([result['href'] for result in results]))
finally:
    loop.run_until_complete(loop.shutdown_asyncgens())
    loop.close()
print('Finished at '+str(datetime.timestamp(datetime.now())))

Whenever I get an error, in this example mainly a aiohttp.ServerDisconnectedError; the output show

Oops, the server connection was dropped before we finished.
Server disconnected
Finished at 1606440807.007339
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!

... and just a million Task was destroyed but it is pending!

So when this error occurs, I dont want the function to finish since there are still a lot of tasks to achieve; hence the error Task was destroyed but it is pending! .

And as you can see, it calls the print('Finished at') before it even called the loop.run_until_complete(download_all_sites([result['href']) ; it seems to completely exit the whole script. (EDIT: I think I found out why this happens. Because of the try: above, since it fails, it goes straight to the finally: clause therefore destroying pending tasks. Still the question of how to avoid that whole disconnect issue remains)

Do you have any idea how I could safely retry the task that had the aiohttp.ServerDisconnectedError error?

Does this have to do with not using if __name__ == "__main__": ?

Does this have to do with not using if __name__ == "__main__": ?

It doesn't have to do with not using if __name__ == "__main__" . It has to do with not handling exceptions in the right place. asyncio.gather() starts the given tasks and returns a tuple of their results. If any of those tasks raises an exception, gather() immediately raises the same exception without waiting for the remaining tasks to finish.

You should handle exceptions in the function you didn't show, safe_download_page . Use try there, catch the aiohttp-related exceptions you can recover from, and try again and try again (using a loop if necessary, with a sleep between iterations) in case of error. Something like this (untested):

async def download_all_pages(sites):
    print('Running download all pages')
    async with aiohttp.ClientSession() as session:
        tasks = [asyncio.ensure_future(safe_download_page(session,url)) for url in sites]
        await asyncio.gather(*tasks)
        try:
            await asyncio.sleep(0.25)
        except asyncio.CancelledError:
            print("Got CancelledError")

async def safe_download_page(session, url):
    while True:
        try:
            async with sem:
                await download_page(session, url)
                break
        except (aiohttp.ServerDisconnectedError, aiohttp.ClientResponseError,aiohttp.ClientConnectorError) as s:
            print("Oops, the server connection was dropped on ", url, ": ", s)
            await asyncio.sleep(1)  # don't hammer the server

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM