Python龍卷風速率限制AsyncHttpClient獲取

Question

當前使用的API將速率限制為每10秒3000個請求。 我有10,000個使用Tornado提取的URL，這是因為它具有異步IO性質。

如何實施速率限制以反映API限制？

from tornado import ioloop, httpclient

i = 0

def handle_request(response):
    print(response.code)
    global i
    i -= 1
    if i == 0:
        ioloop.IOLoop.instance().stop()

http_client = httpclient.AsyncHTTPClient()
for url in open('urls.txt'):
    i += 1
    http_client.fetch(url.strip(), handle_request, method='HEAD')
ioloop.IOLoop.instance().start()

Answer 1

您可以檢查i的值在3000個請求間隔內的位置。 例如，如果i在3000到6000之間，則可以將每個請求的超時設置為10秒，直到6000。在6000之后，只需將超時加倍。 等等。

http_client = AsyncHTTPClient()

timeout = 10
interval = 3000

for url in open('urls.txt'):
    i += 1
    if i <= interval:
        # i is less than 3000
        # just fetch the request without any timeout
        http_client.fetch(url.strip(), handle_request, method='GET')
        continue # skip the rest of the loop

    if i % interval == 1:
        # i is now 3001, or 6001, or so on ...
        timeout += timeout # double the timeout for next 3000 calls

    loop = ioloop.IOLoop.current()
    loop.call_later(timeout, callback=functools.partial(http_client.fetch, url.strip(), handle_request, method='GET'))

注意：我只用少量請求測試了此代碼。 因為您在handle_request函數中減去了i ， handle_request i的值可能會發生變化。 如果是這樣，您應該維護另一個類似於i變量，並對它執行減法。

Python龍卷風速率限制AsyncHttpClient獲取

問題描述

1 個解決方案

解決方案1
1 2017-04-29 12:29:38

Python龍卷風速率限制AsyncHttpClient獲取

問題描述

1 個解決方案

解決方案1 1 2017-04-29 12:29:38

解決方案1
1 2017-04-29 12:29:38