简体   繁体   English

使用 Python 的 asyncio.Semaphore 控制 HTTP 请求的并发性

[英]Controlling the concurrency of HTTP requests using Python's asyncio.Semaphore

I'm trying to figure out a way to limit the number of concurrent HTTP requests made to a server using Python's asyncio and httpx module.我试图找出一种方法来限制使用 Python 的asynciohttpx模块向服务器发出的并发 HTTP 请求的数量。 I came across this StackOverflow answer .我遇到了这个 StackOverflow答案

It proposes asyncio.Semaphore for stopping multiple consumers from making too many requests.它提出了asyncio.Semaphore来阻止多个消费者发出过多的请求。 While this answer works perfectly, it uses explicit loop construction, not asyncio.run .虽然这个答案完美无缺,但它使用显式循环构造,而不是asyncio.run When I replace the explicit loop construction with asyncio.run , the behavior of the code changes.当我用asyncio.run替换显式循环构造时,代码的行为会发生变化。 Instead of doing all 9 requests, now it just executes three requests and then stops.现在它只执行三个请求然后停止,而不是执行所有 9 个请求。

import asyncio
from random import randint


async def download(code):
    wait_time = randint(1, 3)
    print('downloading {} will take {} second(s)'.format(code, wait_time))
    await asyncio.sleep(wait_time)  # I/O, context will switch to main function
    print('downloaded {}'.format(code))


sem = asyncio.Semaphore(3)


async def safe_download(i):
    async with sem:  # semaphore limits num of simultaneous downloads
        return await download(i)


async def main():
    tasks = [
        asyncio.ensure_future(safe_download(i))  # creating task starts coroutine
        for i
        in range(9)
    ]
    await asyncio.gather(*tasks, return_exceptions=True)  # await moment all downloads done


if __name__ ==  '__main__':
    asyncio.run(main())

This prints out:这打印出来:

downloading 0 will take 3 second(s)
downloading 1 will take 1 second(s)
downloading 2 will take 3 second(s)
downloaded 1
downloaded 0
downloaded 2

I had to change await asyncio.gather(*tasks) to await asyncio.gather(*tasks, return_exceptions=True) so that the code doesn't throw a RuntimeError .我必须将await asyncio.gather(*tasks)更改为await asyncio.gather(*tasks, return_exceptions=True)以便代码不会抛出RuntimeError Otherwise it'd throw this error, I've got asyncio debug mode turned on.否则它会抛出这个错误,我已经打开了 asyncio 调试模式。

downloading 0 will take 2 second(s)
downloading 1 will take 3 second(s)
downloading 2 will take 1 second(s)
Traceback (most recent call last):
  File "/home/rednafi/workspace/personal/demo/demo.py", line 66, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/rednafi/workspace/personal/demo/demo.py", line 62, in main
    await asyncio.gather(*tasks)  # await moment all downloads done
  File "/home/rednafi/workspace/personal/demo/demo.py", line 52, in safe_download
    async with sem:  # semaphore limits num of simultaneous downloads
  File "/usr/lib/python3.9/asyncio/locks.py", line 14, in __aenter__
    await self.acquire()
  File "/usr/lib/python3.9/asyncio/locks.py", line 413, in acquire
    await fut
RuntimeError: Task <Task pending name='Task-5' coro=<safe_download() running at /home/rednafi/workspace/personal/demo/demo.py:52> cb=[gather.<locals>._done_callback() at /usr/lib/python3.9/asyncio/tasks.py:764] created at /home/rednafi/workspace/personal/demo/demo.py:58> got Future <Future pending created at /usr/lib/python3.9/asyncio/base_events.py:424> attached to a different loop

However, the only other change is replacing the explicit loop with asyncio.run .但是,唯一的其他更改是用asyncio.run替换显式循环。

The question is why the behavior of the code changed?问题是为什么代码的行为发生了变化? And how can I bring back the old, expected behavior?我怎样才能恢复旧的预期行为?

The problem is that the Semaphore created at top-level caches the event loop active during its creation (an event loop automatically created by asyncio and returned by get_event_loop() at startup).问题是在顶层创建的Semaphore缓存了在其创建期间处于活动状态的事件循环(由 asyncio 自动创建并在启动时由get_event_loop()返回的事件循环)。 asyncio.run() on the other hand creates a fresh event loop on each run.另一方面, asyncio.run()会在每次运行时创建一个新的事件循环。 As a result you're trying to await a semaphore from a different event loop, which fails.结果,您试图等待来自不同事件循环的信号量,但失败了。 As always, hiding the exception without understanding its cause only leads to further issues down the line.与往常一样,隐藏异常而不了解其原因只会导致进一步的问题。

To fix the issue properly, you should create the semaphore while inside asyncio.run() .要正确解决此问题,您应该在asyncio.run()中创建信号量。 For example, the simplest fix can look like this:例如,最简单的修复可能如下所示:

# ...
sem = None

async def main():
    global sem
    sem = asyncio.Semaphore(3)
    # ...

A more elegant approach is to completely remove sem from top-level and explicitly pass it to safe_download :一种更优雅的方法是从顶层完全删除sem并将其显式传递给safe_download

async def safe_download(i, limit):
    async with limit:
        return await download(i)

async def main():
    # limit parallel downloads to 3 at most
    limit = asyncio.Semaphore(3)
    # you don't need to explicitly call create_task() if you call
    # `gather()` because `gather()` will do it for you
    await asyncio.gather(*[safe_download(i, limit) for i in range(9)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM