[英]Why doesn't this asyncio semaphore implementation work with aiohttp in python
我首先發出一個簡單的請求,以獲取包含所有名稱的 JSON,然后遍歷所有名稱並進行與每個名稱對應的異步等待調用,並將它們存儲在一個名為“任務”的列表中,然后我收集所有這些名稱.
問題是,響應服務器對每分鍾 api 響應有限制,無論我將信號量值保持多低,此代碼都需要相同的時間(小到無法滿足服務器的期望)來進行 API 調用,就好像信號量根本不存在一樣。 如何控制 API 調用速率?
<some code>
url = http://example.com/
response = requests.request("GET", url, headers=headers)
async def get_api(session, url_dev):
async with session.get(url_dev, headers = headers) as resp:
result = await resp.json()
return result
async def main():
async with aiohttp.ClientSession() as session:
sem = asyncio.Semaphore(1)
tasks = []
for i in response.json()["Names"]:
url_dev = "https://example.com/example/" + str(i["Id"])
await sem.acquire()
async with sem:
tasks.append(asyncio.create_task(get_api(session, url_dev)))
full_list = list()
async with sem:
full_list = await asyncio.gather(*tasks)
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
這里的信號量確實不是管理速率限制的正確工具,除非您要在單獨的循環中增加信號量,或者在臨界區中添加睡眠。 您還可以安排后續任務進入睡眠狀態,然后將信號量從隊列中取出。
此外,您已將關鍵部分內的所有任務都排入隊列,但執行與關鍵部分異步發生,因為您將其作為任務排入隊列。 您需要在 get_api 方法中包含信號量。
此外,您要獲取信號量兩次; 要么使用acquire
方法並try
/ finally
,要么使用async with
,但不能同時使用兩者。 查看文檔
這是一個簡單的腳本來說明如何讓任務循環不超過每 5 秒間隔啟動超過 5 個任務:
async def dequeue(sem, sleep):
"""Wait for a duration and then increment the semaphore"""
try:
await asyncio.sleep(sleep)
finally:
sem.release()
async def task(sem, sleep, data):
"""Decrement the semaphore, schedule an increment, and then work"""
await sem.acquire()
asyncio.create_task(dequeue(sem, sleep))
# logic here
print(data)
async def main():
max_concurrent = 5
sleep = 5
sem = asyncio.Semaphore(max_concurrent)
tasks = [asyncio.create_task(task(sem, sleep, i)) for i in range(15)]
await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(main())
如果您想真正花哨,也可以將此邏輯包裝在裝飾器中:
def rate_limited(max_concurrent, duration):
def decorator(func):
semaphore = asyncio.Semaphore(max_concurrent)
async def dequeue():
try:
await asyncio.sleep(duration)
finally:
semaphore.release()
@functools.wraps(func)
async def wrapper(*args, **kwargs):
await semaphore.acquire()
asyncio.create_task(dequeue())
return await func(*args, **kwargs)
return wrapper
return decorator
然后代碼變為以下(注意信號量是在asyncio.run
之外創建的,因此您需要查詢默認循環以使其正常工作):
@rate_limited(max_concurrent=5, duration=5)
async def task(i):
print(i)
async def main():
tasks = [asyncio.create_task(task(i)) for i in range(7)]
await asyncio.gather(*tasks)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
當您在get_api
中運行對 API 端點的請求時,而不是在創建任務並gather
結果時,您應該獲取並釋放信號量 object。 此外,根據您的示例用例,當您使用其上下文管理器時,不需要手動調用sem.acquire
和sem.release
:
async def get_api(session, sem:asyncio.Semaphore, url_dev):
#below, using both the semaphore and session.get in a context manager
#now, the semaphore will properly block requests when the limit has been reached, until others have finished
async with sem, session.get(url_dev, headers = headers) as resp:
result = await resp.json()
return result
async def main():
sem = asyncio.Semaphore(1)
async with aiohttp.ClientSession() as session:
tasks = []
for i in response.json()["Names"]:
url_dev = "https://example.com/example/" + str(i["Id"])
#passing the semaphore instance to get_api
tasks.append(asyncio.create_task(get_api(session, sem, url_dev)))
full_list = await asyncio.gather(*tasks)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.