How can I set maximum number of requests per second (limit them) in client side using aiohttp?
Although it's not exactly a limit on the number of requests per second , note that since v2.0, when using a ClientSession
, aiohttp
automatically limits the number of simultaneous connections to 100.
You can modify the limit by creating your own TCPConnector
and passing it into the ClientSession
. For instance, to create a client limited to 50 simultaneous requests:
import aiohttp
connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)
In case it's better suited to your use case, there is also a limit_per_host
parameter (which is off by default) that you can pass to limit the number of simultaneous connections to the same "endpoint". Per the docs:
limit_per_host
(int
) – limit for simultaneous connections to the same endpoint. Endpoints are the same if they are have equal(host, port, is_ssl)
triple.
Example usage:
import aiohttp
connector = aiohttp.TCPConnector(limit_per_host=50)
client = aiohttp.ClientSession(connector=connector)
I found one possible solution here:http://compiletoi.net/fast-scraping-in-python-with-asyncio.html
Doing 3 requests at the same time is cool, doing 5000, however, is not so nice. If you try to do too many requests at the same time, connections might start to get closed, or you might even get banned from the website.
To avoid this, you can use a semaphore. It is a synchronization tool that can be used to limit the number of coroutines that do something at some point. We'll just create the semaphore before creating the loop, passing as an argument the number of simultaneous requests we want to allow:
sem = asyncio.Semaphore(5)
Then, we just replace:
page = yield from get(url, compress=True)
by the same thing, but protected by a semaphore:
with (yield from sem):
page = yield from get(url, compress=True)
This will ensure that at most 5 requests can be done at the same time.
You could set a delay per request or group the URLs in batches and throttle the batches to meet desired frequency.
Force the script to wait in between requests using asyncio.sleep
import asyncio
import aiohttp
delay_per_request = 0.5
urls = [
# put some URLs here...
]
async def app():
tasks = []
for url in urls:
tasks.append(asyncio.ensure_future(make_request(url)))
await asyncio.sleep(delay_per_request)
results = await asyncio.gather(*tasks)
return results
async def make_request(url):
print('$$$ making request')
async with aiohttp.ClientSession() as sess:
async with sess.get(url) as resp:
status = resp.status
text = await resp.text()
print('### got page data')
return url, status, text
This can be run with eg results = asyncio.run(app())
.
Using make_request
from above, you can request and throttle batches of URLs like this:
import asyncio
import aiohttp
import time
max_requests_per_second = 0.5
urls = [[
# put a few URLs here...
],[
# put a few more URLs here...
]]
async def app():
results = []
for i, batch in enumerate(urls):
t_0 = time.time()
print(f'batch {i}')
tasks = [asyncio.ensure_future(make_request(url)) for url in batch]
for t in tasks:
d = await t
results.append(d)
t_1 = time.time()
# Throttle requests
batch_time = (t_1 - t_0)
batch_size = len(batch)
wait_time = (batch_size / max_requests_per_second) - batch_time
if wait_time > 0:
print(f'Too fast! Waiting {wait_time} seconds')
time.sleep(wait_time)
return results
Again, this can be run with asyncio.run(app())
.
This is an example without aiohttp
, but you can wrap any async method or aiohttp.request
using the Limit
decorator
import asyncio
import time
class Limit(object):
def __init__(self, calls=5, period=1):
self.calls = calls
self.period = period
self.clock = time.monotonic
self.last_reset = 0
self.num_calls = 0
def __call__(self, func):
async def wrapper(*args, **kwargs):
if self.num_calls >= self.calls:
await asyncio.sleep(self.__period_remaining())
period_remaining = self.__period_remaining()
if period_remaining <= 0:
self.num_calls = 0
self.last_reset = self.clock()
self.num_calls += 1
return await func(*args, **kwargs)
return wrapper
def __period_remaining(self):
elapsed = self.clock() - self.last_reset
return self.period - elapsed
@Limit(calls=5, period=2)
async def test_call(x):
print(x)
async def worker():
for x in range(100):
await test_call(x + 1)
asyncio.run(worker())
Because none of the solution works from the other answers (I've already tried) if the API request limits the time since the end of the request. I'm posting a new one that should work:
class Limiter:
def __init__(self, calls_limit: int = 5, period: int = 1):
self.calls_limit = calls_limit
self.period = period
self.semaphore = asyncio.Semaphore(calls_limit)
self.requests_finish_time = []
async def sleep(self):
if len(self.requests_finish_time) >= self.calls_limit:
sleep_before = self.requests_finish_time.pop(0)
if sleep_before >= time.monotonic():
await asyncio.sleep(sleep_before - time.monotonic())
def __call__(self, func):
async def wrapper(*args, **kwargs):
async with self.semaphore:
await self.sleep()
res = await func(*args, **kwargs)
self.requests_finish_time.append(time.monotonic() + self.period)
return res
return wrapper
Usage:
@Limiter(calls_limit=5, period=1)
async def api_call():
...
async def main():
tasks = [asyncio.create_task(api_call(url)) for url in urls]
asyncio.gather(*tasks)
if __name__ == '__main__':
loop = asyncio.get_event_loop_policy().get_event_loop()
loop.run_until_complete(main())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.