[英]Python Tornado rate limiting AsyncHttpClient fetch
Currently using an API that rate limits me to 3000 requests per 10 seconds. 当前使用的API将速率限制为每10秒3000个请求。 I have 10,000 urls that are fetched using Tornado due to it's asynchronous IO nature.
我有10,000个使用Tornado提取的URL,这是因为它具有异步IO性质。
How do I go about implementing a rate limit to reflect the API limit? 如何实施速率限制以反映API限制?
from tornado import ioloop, httpclient
i = 0
def handle_request(response):
print(response.code)
global i
i -= 1
if i == 0:
ioloop.IOLoop.instance().stop()
http_client = httpclient.AsyncHTTPClient()
for url in open('urls.txt'):
i += 1
http_client.fetch(url.strip(), handle_request, method='HEAD')
ioloop.IOLoop.instance().start()
You can check where does the value of i
lies in the interval of 3000 requests. 您可以检查
i
的值在3000个请求间隔内的位置。 For example, if i
is in between 3000 and 6000, you can set the timeout of 10 seconds on every request until 6000. After 6000, just double the timeout. 例如,如果
i
在3000到6000之间,则可以将每个请求的超时设置为10秒,直到6000。在6000之后,只需将超时加倍。 And so on. 等等。
http_client = AsyncHTTPClient()
timeout = 10
interval = 3000
for url in open('urls.txt'):
i += 1
if i <= interval:
# i is less than 3000
# just fetch the request without any timeout
http_client.fetch(url.strip(), handle_request, method='GET')
continue # skip the rest of the loop
if i % interval == 1:
# i is now 3001, or 6001, or so on ...
timeout += timeout # double the timeout for next 3000 calls
loop = ioloop.IOLoop.current()
loop.call_later(timeout, callback=functools.partial(http_client.fetch, url.strip(), handle_request, method='GET'))
Note : I only tested this code with small number of requests. 注意 :我只用少量请求测试了此代码。 It might be possible that the value of
i
would change because you're subtracting i
in handle_request
function. 因为您在
handle_request
函数中减去了i
, handle_request
i
的值可能会发生变化。 If that's the case, you should maintain another variable similar to i
and perform subtraction on that. 如果是这样,您应该维护另一个类似于
i
变量,并对它执行减法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.