繁体   English   中英

为什么此异步信号量实现不能与 python 中的 aiohttp 一起使用

[英]Why doesn't this asyncio semaphore implementation work with aiohttp in python

我首先发出一个简单的请求,以获取包含所有名称的 JSON,然后遍历所有名称并进行与每个名称对应的异步等待调用,并将它们存储在一个名为“任务”的列表中,然后我收集所有这些名称.

问题是,响应服务器对每分钟 api 响应有限制,无论我将信号量值保持多低,此代码都需要相同的时间(小到无法满足服务器的期望)来进行 API 调用,就好像信号量根本不存在一样。 如何控制 API 调用速率?

<some code>    
url = http://example.com/
response = requests.request("GET", url, headers=headers)    

async def get_api(session, url_dev):
    async with session.get(url_dev, headers = headers) as resp:
        result = await resp.json()
        return result

async def main():
    async with aiohttp.ClientSession() as session:
        sem = asyncio.Semaphore(1)
        tasks = []
        for i in response.json()["Names"]:

            url_dev = "https://example.com/example/" + str(i["Id"])
            await sem.acquire()
            async with sem:
                tasks.append(asyncio.create_task(get_api(session, url_dev)))


        full_list = list()
        async with sem:
            full_list = await asyncio.gather(*tasks)

asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())

这里的信号量确实不是管理速率限制的正确工具,除非您要在单独的循环中增加信号量,或者在临界区中添加睡眠。 您还可以安排后续任务进入睡眠状态,然后将信号量从队列中取出。

此外,您已将关键部分内的所有任务都排入队列,但执行与关键部分异步发生,因为您将其作为任务排入队列。 您需要在 get_api 方法中包含信号量。

此外,您要获取信号量两次; 要么使用acquire方法并try / finally ,要么使用async with ,但不能同时使用两者。 查看文档

这是一个简单的脚本来说明如何让任务循环不超过每 5 秒间隔启动超过 5 个任务:

async def dequeue(sem, sleep):
    """Wait for a duration and then increment the semaphore"""
    try:
        await asyncio.sleep(sleep)
    finally:
        sem.release()


async def task(sem, sleep, data):
    """Decrement the semaphore, schedule an increment, and then work"""
    await sem.acquire()
    asyncio.create_task(dequeue(sem, sleep))
    # logic here
    print(data)


async def main():
    max_concurrent = 5
    sleep = 5

    sem = asyncio.Semaphore(max_concurrent)
    tasks = [asyncio.create_task(task(sem, sleep, i)) for i in range(15)]
    await asyncio.gather(*tasks)


if __name__ == "__main__":
    asyncio.run(main())

如果您想真正花哨,也可以将此逻辑包装在装饰器中:

def rate_limited(max_concurrent, duration):
    def decorator(func):
        semaphore = asyncio.Semaphore(max_concurrent)

        async def dequeue():
            try:
                await asyncio.sleep(duration)
            finally:
                semaphore.release()

        @functools.wraps(func)
        async def wrapper(*args, **kwargs):
            await semaphore.acquire()
            asyncio.create_task(dequeue())
            return await func(*args, **kwargs)

        return wrapper
    return decorator

然后代码变为以下(注意信号量是在asyncio.run之外创建的,因此您需要查询默认循环以使其正常工作):

@rate_limited(max_concurrent=5, duration=5)
async def task(i):
    print(i)


async def main():
    tasks = [asyncio.create_task(task(i)) for i in range(7)]
    await asyncio.gather(*tasks)


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

当您在get_api中运行对 API 端点的请求时,而不是在创建任务并gather结果时,您应该获取并释放信号量 object。 此外,根据您的示例用例,当您使用其上下文管理器时,不需要手动调用sem.acquiresem.release

async def get_api(session, sem:asyncio.Semaphore, url_dev):
   #below, using both the semaphore and session.get in a context manager
   #now, the semaphore will properly block requests when the limit has been reached, until others have finished 
   async with sem, session.get(url_dev, headers = headers) as resp:
      result = await resp.json()
      return result

async def main():
   sem = asyncio.Semaphore(1)
   async with aiohttp.ClientSession() as session:
   tasks = []
   for i in response.json()["Names"]:
       url_dev = "https://example.com/example/" + str(i["Id"])
       #passing the semaphore instance to get_api
       tasks.append(asyncio.create_task(get_api(session, sem, url_dev)))
       full_list = await asyncio.gather(*tasks)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM