當異常計數超過工作人員計數時，如何使用 return_exceptions=True 獲取 httpx.gather() 以完成任務隊列？

Question

我第一次將 asyncio 與 httpx.AsyncClient 一起使用，並試圖弄清楚當其中一些任務可能失敗時如何完成我的任務列表。 我正在使用我在幾個地方找到的模式，在這些地方我用協程函數填充了一個 asyncio 隊列，並有一組從 asyncio.gather 內部排隊的工作進程。 通常，如果執行工作的 function 引發異常，您將看到整個腳本在該處理期間失敗，並報告異常以及RuntimeWarning: coroutine foo was never awaited ，表明您從未完成您的列表。

我找到了 asyncio.gather 的return_exceptions選項，這有所幫助，但並不完全。 在我收到異常的次數與我在 call to gather中投入的工人總數相同之后，我的腳本仍然會死掉。 以下是演示該問題的簡單腳本。

from httpx import AsyncClient, Timeout
from asyncio import run, gather, Queue as asyncio_Queue
from random import choice


async def process_url(client, url):
    """
    opens the URL and pulls a header attribute
    randomly raises an exception to demonstrate my problem
    """
    if choice([True, False]):
        await client.get(url)
        print(f'retrieved url {url}')
    else:
        raise AssertionError(f'generated error for url {url}')


async def main(worker_count, urls):
    """
    orchestrates the workers that call process_url
    """
    httpx_timeout = Timeout(10.0, read=20.0)
    async with AsyncClient(timeout=httpx_timeout, follow_redirects=True) as client:
        tasks = asyncio_Queue(maxsize=0)
        for url in urls:
            await tasks.put(process_url(client, url))

        async def worker():
            while not tasks.empty():
                await tasks.get_nowait()

        results = await gather(*[worker() for _ in range(worker_count)], return_exceptions=True)
        return results

if __name__ == '__main__':
    urls = ['https://stackoverflow.com/questions',
            'https://stackoverflow.com/jobs',
            'https://stackoverflow.com/tags',
            'https://stackoverflow.com/users',
            'https://www.google.com/',
            'https://www.bing.com/',
            'https://www.yahoo.com/',
            'https://www.foxnews.com/',
            'https://www.cnn.com/',
            'https://www.npr.org/',
            'https://www.opera.com/',
            'https://www.mozilla.org/en-US/firefox/',
            'https://www.google.com/chrome/',
            'https://www.epicbrowser.com/'
            ]
    print(f'processing {len(urls)} urls')
    run_results = run(main(4, urls))
    print('\n'.join([str(rr) for rr in run_results]))

此腳本的一次運行輸出：

processing 14 urls
retrieved url https://stackoverflow.com/tags
retrieved url https://stackoverflow.com/jobs
retrieved url https://stackoverflow.com/users
retrieved url https://www.bing.com/
generated error for url https://stackoverflow.com/questions
generated error for url https://www.foxnews.com/
generated error for url https://www.google.com/
generated error for url https://www.yahoo.com/
sys:1: RuntimeWarning: coroutine 'process_url' was never awaited

Process finished with exit code 0

在這里，您看到我們通過了總共 14 個 url 中的 8 個，但是當我們遇到 4 個錯誤時，腳本結束並忽略了 url 的 rest。

我想要做的是讓腳本完成完整的 URL 集，但在最后通知我錯誤。 有沒有辦法在這里做到這一點？ 可能是我必須將process_url()中的所有內容包裝在try/except塊中，並使用 aiofile 之類的東西最后將它們轉儲出來？

更新需要明確的是，這個演示腳本是對我真正在做的事情的簡化。 我的真實腳本是在少數服務器 api 端點上打了幾十萬次。 使用一組工作人員的目的是避免壓倒我正在訪問的服務器[它是測試服務器，而不是生產服務器，因此它不打算處理大量請求，盡管數量大於 4 8-)]。 我願意學習替代品。

Answer 1

您概述的程序設計應該可以正常工作，但您必須防止任務（您的worker函數的實例）崩潰。 下面的清單顯示了一種方法。

您的 Queue 名為“tasks”，但您放入其中的項目不是任務 - 它們是coroutines 。 就目前而言，您的程序有五個任務：其中一個是main的 function，它由 asyncio.run() 制成一個任務。 其他四個任務是worker的實例，由 asyncio.gather 制成任務。

當worker在協程上等待並且該協程崩潰時，異常會在 await 語句中傳播到worker 。 因為異常沒有被處理， worker會依次崩潰。 為防止這種情況，請執行以下操作：

async def worker():
    while not tasks.empty():
        try:
            await tasks.get_nowait()
        except Exception:
            pass
            # You might want to do something more intelligent here
            # (logging, perhaps), rather than simply suppressing the exception

這應該允許您的示例程序運行完成。

當異常計數超過工作人員計數時，如何使用 return_exceptions=True 獲取 httpx.gather() 以完成任務隊列？

問題描述

1 個解決方案

解決方案1
0 2021-12-31 23:09:51

當異常計數超過工作人員計數時，如何使用 return_exceptions=True 獲取 httpx.gather() 以完成任務隊列？

問題描述

1 個解決方案

解決方案1 0 2021-12-31 23:09:51

解決方案1
0 2021-12-31 23:09:51