简体   繁体   English

aiohttp:设置每秒最大请求数

[英]aiohttp: set maximum number of requests per second

How can I set maximum number of requests per second (limit them) in client side using aiohttp?如何使用 aiohttp 在客户端设置每秒最大请求数(限制它们)?

Although it's not exactly a limit on the number of requests per second , note that since v2.0, when using a ClientSession , aiohttp automatically limits the number of simultaneous connections to 100.虽然不完全限制每秒请求数,但请注意,从 v2.0 开始,当使用ClientSession时, aiohttp自动将同时连接数限制为 100。

You can modify the limit by creating your own TCPConnector and passing it into the ClientSession .您可以通过创建自己的TCPConnector并将其传递给ClientSession来修改限制。 For instance, to create a client limited to 50 simultaneous requests:例如,要创建一个限制为 50 个同时请求的客户端:

import aiohttp

connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)

In case it's better suited to your use case, there is also a limit_per_host parameter (which is off by default) that you can pass to limit the number of simultaneous connections to the same "endpoint".如果它更适合您的用例,还有一个limit_per_host参数(默认情况下是关闭的),您可以传递该参数以限制同时连接到同一“端点”的数量。 Per the docs:根据文档:

limit_per_host ( int ) – limit for simultaneous connections to the same endpoint. limit_per_host ( int ) – 同时连接到同一端点的限制。 Endpoints are the same if they are have equal (host, port, is_ssl) triple.如果端点具有相等的(host, port, is_ssl)三元组,则它们是相同的。

Example usage:示例用法:

import aiohttp

connector = aiohttp.TCPConnector(limit_per_host=50)
client = aiohttp.ClientSession(connector=connector)

I found one possible solution here:http://compiletoi.net/fast-scraping-in-python-with-asyncio.html我在这里找到了一种可能的解决方案:http://compiletoi.net/fast-scraping-in-python-with-asyncio.html

Doing 3 requests at the same time is cool, doing 5000, however, is not so nice.同时做 3 个请求很酷,但是做 5000 个就不是很好了。 If you try to do too many requests at the same time, connections might start to get closed, or you might even get banned from the website.如果您尝试同时执行太多请求,则连接可能会开始关闭,或者您甚至可能会被网站禁止访问。

To avoid this, you can use a semaphore.为避免这种情况,您可以使用信号量。 It is a synchronization tool that can be used to limit the number of coroutines that do something at some point.它是一种同步工具,可用于限制在某个时间点执行某些操作的协程的数量。 We'll just create the semaphore before creating the loop, passing as an argument the number of simultaneous requests we want to allow:我们将在创建循环之前创建信号量,将我们希望允许的同时请求数作为参数传递:

sem = asyncio.Semaphore(5)

Then, we just replace:然后,我们只需替换:

page = yield from get(url, compress=True)

by the same thing, but protected by a semaphore:通过同样的事情,但受信号量保护:

with (yield from sem):
    page = yield from get(url, compress=True)

This will ensure that at most 5 requests can be done at the same time.这将确保最多可以同时完成 5 个请求。

You could set a delay per request or group the URLs in batches and throttle the batches to meet desired frequency.您可以为每个请求设置延迟或将 URL 分批分组并限制批次以满足所需的频率。

1. Delay per request 1. 每个请求的延迟

Force the script to wait in between requests using asyncio.sleep使用asyncio.sleep强制脚本在请求之间等待

import asyncio
import aiohttp

delay_per_request = 0.5
urls = [
   # put some URLs here...
]

async def app():
    tasks = []
    for url in urls:
        tasks.append(asyncio.ensure_future(make_request(url)))
        await asyncio.sleep(delay_per_request)

    results = await asyncio.gather(*tasks)
    return results

async def make_request(url):
    print('$$$ making request')
    async with aiohttp.ClientSession() as sess:
        async with sess.get(url) as resp:
            status = resp.status
            text = await resp.text()
            print('### got page data')
            return url, status, text

This can be run with eg results = asyncio.run(app()) .这可以使用例如results = asyncio.run(app())来运行。

2. Batch throttle 2.批量节流

Using make_request from above, you can request and throttle batches of URLs like this:使用上面的make_request ,您可以像这样请求和限制批量的 URL:

import asyncio
import aiohttp
import time

max_requests_per_second = 0.5
urls = [[
   # put a few URLs here...
],[
   # put a few more URLs here...
]]

async def app():
    results = []
    for i, batch in enumerate(urls):
        t_0 = time.time()
        print(f'batch {i}')
        tasks = [asyncio.ensure_future(make_request(url)) for url in batch]
        for t in tasks:
            d = await t
            results.append(d)
        t_1 = time.time()

        # Throttle requests
        batch_time = (t_1 - t_0)
        batch_size = len(batch)
        wait_time = (batch_size / max_requests_per_second) - batch_time
        if wait_time > 0:
            print(f'Too fast! Waiting {wait_time} seconds')
            time.sleep(wait_time)

    return results

Again, this can be run with asyncio.run(app()) .同样,这可以使用asyncio.run(app())

This is an example without aiohttp , but you can wrap any async method or aiohttp.request using the Limit decorator这是一个没有aiohttp的示例,但您可以使用Limit装饰器包装任何异步方法或aiohttp.request

import asyncio
import time


class Limit(object):
    def __init__(self, calls=5, period=1):
        self.calls = calls
        self.period = period
        self.clock = time.monotonic
        self.last_reset = 0
        self.num_calls = 0

    def __call__(self, func):
        async def wrapper(*args, **kwargs):
            if self.num_calls >= self.calls:
                await asyncio.sleep(self.__period_remaining())

            period_remaining = self.__period_remaining()

            if period_remaining <= 0:
                self.num_calls = 0
                self.last_reset = self.clock()

            self.num_calls += 1

            return await func(*args, **kwargs)

        return wrapper

    def __period_remaining(self):
        elapsed = self.clock() - self.last_reset
        return self.period - elapsed


@Limit(calls=5, period=2)
async def test_call(x):
    print(x)


async def worker():
    for x in range(100):
        await test_call(x + 1)


asyncio.run(worker())

Because none of the solution works from the other answers (I've already tried) if the API request limits the time since the end of the request.因为如果 API 请求限制了自请求结束以来的时间,则没有任何解决方案适用于其他答案(我已经尝试过)。 I'm posting a new one that should work:我正在发布一个应该可以工作的新帖子:

class Limiter:
    def __init__(self, calls_limit: int = 5, period: int = 1):
        self.calls_limit = calls_limit
        self.period = period
        self.semaphore = asyncio.Semaphore(calls_limit)
        self.requests_finish_time = []

    async def sleep(self):
        if len(self.requests_finish_time) >= self.calls_limit:
            sleep_before = self.requests_finish_time.pop(0)
            if sleep_before >= time.monotonic():
                await asyncio.sleep(sleep_before - time.monotonic())

    def __call__(self, func):
        async def wrapper(*args, **kwargs):

            async with self.semaphore:
                await self.sleep()
                res = await func(*args, **kwargs)
                self.requests_finish_time.append(time.monotonic() + self.period)

            return res

        return wrapper

Usage:用法:

@Limiter(calls_limit=5, period=1)
async def api_call():
    ...


async def main():
    tasks = [asyncio.create_task(api_call(url)) for url in urls]
    asyncio.gather(*tasks)


if __name__ == '__main__':
    loop = asyncio.get_event_loop_policy().get_event_loop()
    loop.run_until_complete(main())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM