简体   繁体   English

限制 Python 每秒的 HTTP 请求数

[英]Limiting number of HTTP requests per second on Python

I've written a script that fetches URLs from a file and sends HTTP requests to all the URLs concurrently.我编写了一个从文件中获取 URL 并同时向所有 URL 发送 HTTP 请求的脚本。 I now want to limit the number of HTTP requests per second and the bandwidth per interface ( eth0 , eth1 , etc.) in a session.我现在想限制会话中每秒 HTTP 请求的数量和每个接口( eth0eth1等)的带宽。 Is there any way to achieve this on Python?有没有办法在 Python 上实现这一点?

You could use Semaphore object which is part of the standard Python lib: python doc您可以使用 Semaphore 对象,它是标准 Python 库的一部分: python doc

Or if you want to work with threads directly, you could use wait([timeout]).或者,如果您想直接使用线程,则可以使用 wait([timeout])。

There is no library bundled with Python which can work on the Ethernet or other network interface.没有与 Python 捆绑的库可以在以太网或其他网络接口上工作。 The lowest you can go is socket.你可以去的最低的是插座。

Based on your reply, here's my suggestion.根据您的回复,这是我的建议。 Notice the active_count.注意active_count。 Use this only to test that your script runs only two threads.仅用于测试您的脚本是否仅运行两个线程。 Well in this case they will be three because number one is your script then you have two URL requests.那么在这种情况下,它们将是三个,因为第一是您的脚本,然后您有两个 URL 请求。

import time
import requests
import threading

# Limit the number of threads.
pool = threading.BoundedSemaphore(2)

def worker(u):
    # Request passed URL.
    r = requests.get(u)
    print r.status_code
    # Release lock for other threads.
    pool.release()
    # Show the number of active threads.
    print threading.active_count()

def req():
    # Get URLs from a text file, remove white space.
    urls = [url.strip() for url in open('urllist.txt')]
    for u in urls:
        # Thread pool.
        # Blocks other threads (more than the set limit).
        pool.acquire(blocking=True)
        # Create a new thread.
        # Pass each URL (i.e. u parameter) to the worker function.
        t = threading.Thread(target=worker, args=(u, ))
        # Start the newly create thread.
        t.start()

req()

You could use a worker concept like described in the documentation: https://docs.python.org/3.4/library/queue.html您可以使用文档中描述的工人概念: https : //docs.python.org/3.4/library/queue.html

Add a wait() command inside your workers to get them waiting between the requests (in the example from documentation: inside the "while true" after the task_done).在您的工作人员中添加一个 wait() 命令,让他们在请求之间等待(在文档示例中:在 task_done 之后的“while true”中)。

Example: 5 "Worker"-Threads with a waiting time of 1 sec between the requests will do less then 5 fetches per second.示例:在请求之间等待时间为 1 秒的 5 个“工人”线程将每秒执行少于 5 次提取。

Note the solution below still send the requests serially but limits the TPS (transactions per second)请注意,下面的解决方案仍然以串行方式发送请求,但限制了 TPS(每秒事务数)

TLDR; TLDR; There is a class which keeps a count of the number of calls that can still be made in the current second.有一个类可以记录当前秒内仍然可以进行的调用次数。 It is decremented for every call that is made and refilled every second.每秒进行的每次呼叫和重新填充都会递减。

import time
from multiprocessing import Process, Value

# Naive TPS regulation

# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:

    def __init__(self, expected_tps):
        self.number_of_tokens = Value('i', 0)
        self.expected_tps = expected_tps
        self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket

    def refill_bucket_per_second(self):
        while True:
            print("refill")
            self.refill_bucket()
            time.sleep(1)

    def refill_bucket(self):
        self.number_of_tokens.value = self.expected_tps
        print('bucket count after refill', self.number_of_tokens)

    def start(self):
        self.bucket_refresh_process.start()

    def stop(self):
        self.bucket_refresh_process.kill()

    def get_token(self):
        response = False
        if self.number_of_tokens.value > 0:
            with self.number_of_tokens.get_lock():
                if self.number_of_tokens.value > 0:
                    self.number_of_tokens.value -= 1
                    response = True

        return response

def test():
    tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
    tps_bucket.start()
    total_number_of_requests = 60 ## Let's say I want to send 60 requests
    request_number = 0
    t0 = time.time()
    while True:
        if tps_bucket.get_token():
            request_number += 1

            print('Request', request_number) ## This is my request

            if request_number == total_number_of_requests:
                break

    print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
    tps_bucket.stop()


if __name__ == "__main__":
    test()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM