為什么此異步信號量實現不能與 python 中的 aiohttp 一起使用

Question

我首先發出一個簡單的請求，以獲取包含所有名稱的 JSON，然后遍歷所有名稱並進行與每個名稱對應的異步等待調用，並將它們存儲在一個名為“任務”的列表中，然后我收集所有這些名稱.

問題是，響應服務器對每分鍾 api 響應有限制，無論我將信號量值保持多低，此代碼都需要相同的時間（小到無法滿足服務器的期望）來進行 API 調用，就好像信號量根本不存在一樣。 如何控制 API 調用速率？

<some code>    
url = http://example.com/
response = requests.request("GET", url, headers=headers)    

async def get_api(session, url_dev):
    async with session.get(url_dev, headers = headers) as resp:
        result = await resp.json()
        return result

async def main():
    async with aiohttp.ClientSession() as session:
        sem = asyncio.Semaphore(1)
        tasks = []
        for i in response.json()["Names"]:

            url_dev = "https://example.com/example/" + str(i["Id"])
            await sem.acquire()
            async with sem:
                tasks.append(asyncio.create_task(get_api(session, url_dev)))


        full_list = list()
        async with sem:
            full_list = await asyncio.gather(*tasks)

asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())

Answer 1

這里的信號量確實不是管理速率限制的正確工具，除非您要在單獨的循環中增加信號量，或者在臨界區中添加睡眠。 您還可以安排后續任務進入睡眠狀態，然后將信號量從隊列中取出。

此外，您已將關鍵部分內的所有任務都排入隊列，但執行與關鍵部分異步發生，因為您將其作為任務排入隊列。 您需要在 get_api 方法中包含信號量。

此外，您要獲取信號量兩次； 要么使用acquire方法並try / finally ，要么使用async with ，但不能同時使用兩者。 查看文檔

這是一個簡單的腳本來說明如何讓任務循環不超過每 5 秒間隔啟動超過 5 個任務：

async def dequeue(sem, sleep):
    """Wait for a duration and then increment the semaphore"""
    try:
        await asyncio.sleep(sleep)
    finally:
        sem.release()


async def task(sem, sleep, data):
    """Decrement the semaphore, schedule an increment, and then work"""
    await sem.acquire()
    asyncio.create_task(dequeue(sem, sleep))
    # logic here
    print(data)


async def main():
    max_concurrent = 5
    sleep = 5

    sem = asyncio.Semaphore(max_concurrent)
    tasks = [asyncio.create_task(task(sem, sleep, i)) for i in range(15)]
    await asyncio.gather(*tasks)


if __name__ == "__main__":
    asyncio.run(main())

如果您想真正花哨，也可以將此邏輯包裝在裝飾器中：

def rate_limited(max_concurrent, duration):
    def decorator(func):
        semaphore = asyncio.Semaphore(max_concurrent)

        async def dequeue():
            try:
                await asyncio.sleep(duration)
            finally:
                semaphore.release()

        @functools.wraps(func)
        async def wrapper(*args, **kwargs):
            await semaphore.acquire()
            asyncio.create_task(dequeue())
            return await func(*args, **kwargs)

        return wrapper
    return decorator

然后代碼變為以下（注意信號量是在asyncio.run之外創建的，因此您需要查詢默認循環以使其正常工作）：

@rate_limited(max_concurrent=5, duration=5)
async def task(i):
    print(i)


async def main():
    tasks = [asyncio.create_task(task(i)) for i in range(7)]
    await asyncio.gather(*tasks)


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Answer 2

當您在get_api中運行對 API 端點的請求時，而不是在創建任務並gather結果時，您應該獲取並釋放信號量 object。 此外，根據您的示例用例，當您使用其上下文管理器時，不需要手動調用sem.acquire和sem.release ：

async def get_api(session, sem:asyncio.Semaphore, url_dev):
   #below, using both the semaphore and session.get in a context manager
   #now, the semaphore will properly block requests when the limit has been reached, until others have finished 
   async with sem, session.get(url_dev, headers = headers) as resp:
      result = await resp.json()
      return result

async def main():
   sem = asyncio.Semaphore(1)
   async with aiohttp.ClientSession() as session:
   tasks = []
   for i in response.json()["Names"]:
       url_dev = "https://example.com/example/" + str(i["Id"])
       #passing the semaphore instance to get_api
       tasks.append(asyncio.create_task(get_api(session, sem, url_dev)))
       full_list = await asyncio.gather(*tasks)

為什么此異步信號量實現不能與 python 中的 aiohttp 一起使用

問題描述

2 個解決方案

解決方案1
2 2021-06-15 07:11:48

解決方案2
1 2021-06-15 16:26:03

為什么此異步信號量實現不能與 python 中的 aiohttp 一起使用

問題描述

2 個解決方案

解決方案1 2 2021-06-15 07:11:48

解決方案2 1 2021-06-15 16:26:03

解決方案1
2 2021-06-15 07:11:48

解決方案2
1 2021-06-15 16:26:03