限制Python线程的并发和速率

Question

Given a number threads I want to limit the rate of calls to the worker function to a rate of say one per second. 给定多个线程，我想将对worker函数的调用速率限制为每秒1个的速率。

My idea was to keep track of the last time a call was made across all threads and compare this to the current time in each thread. 我的想法是跟踪所有线程上次调用的时间，并将其与每个线程中的当前时间进行比较。 Then if current_time - last_time < rate . 然后，如果current_time - last_time < rate 。 I let the thread sleep for a bit. 我让线程睡了一会儿。 Something is wrong with my implementation - I presume I may have gotten the wrong idea about how locks work. 我的实现存在问题-我想我可能对锁的工作方式有错误的认识。

My code: 我的代码：

from Queue import Queue
from threading import Thread, Lock, RLock
import time

num_worker_threads = 2
rate = 1
q = Queue()
lock = Lock()
last_time = [time.time()]

def do_work(i, idx):
    # Do work here, print is just a dummy.
    print('Thread: {0}, Item: {1}, Time: {2}'.format(i, idx, time.time()))

def worker(i):
    while True:
        lock.acquire()
        current_time = time.time()
        interval = current_time - last_time[0]
        last_time[0] = current_time
        if interval < rate:
            time.sleep(rate - interval)
        lock.release()
        item = q.get()
        do_work(i, item)
        q.task_done()

for i in range(num_worker_threads):
     t = Thread(target=worker, args=[i])
     t.daemon = True
     t.start()

for item in xrange(10):
    q.put(item)

q.join()

I was expecting to see one call per second to do_work , however, I get mostly 2 calls at the same time (1 for each thread), followed by a one second pause. 我原本希望每秒看到一次do_work调用，但是，我同时do_work收到2个调用（每个线程1个），然后暂停一秒钟。 What is wrong? 怎么了？

Ok, some edit. 好的，进行一些编辑。 The advice to simply throttle the rate at which items are put in the queue was good, however I remembered that I had to take care of the case in which items are re-added to the queue by the workers. 只是简单地限制将物品放入队列的速度的建议是好的，但是我记得我不得不照顾工人将物品重新添加到队列中的情况。 Canonical example: pagination or backing-off-retry in network tasks. 典型示例：网络任务中的分页或退避重试。 I came up with the following. 我提出了以下内容。 I guess that for actual network tasks eventlet/gevent libraries may be easier on resources but this is just an example. 我猜想对于实际的网络任务，eventlet / gevent库在资源上可能会更容易，但这只是一个例子。 It basically uses a priority queue to pile up the requests and uses an extra thread to shovel items from the pile to the actual task queue at an even rate. 它基本上使用优先级队列来堆积请求，并使用额外的线程以均匀的速率将项目从堆中铲除到实际任务队列中。 I simulated re-insertion into the pile by the workers, re-inserted items are then treated first. 我模拟了工人将其重新插入桩中的情况，然后首先对重新插入的物品进行处理。

import sys
import os
import time
import random

from Queue import Queue, PriorityQueue
from threading import Thread

rate = 0.1

def worker(q, q_pile, idx):
    while True:
        item = q.get()
        print("Thread: {0} processed: {1}".format(item[1], idx))
        if random.random() > 0.3:
            print("Thread: {1} reinserting item: {0}".format(item[1], idx))
            q_pile.put((-1 * time.time(), item[1]))
        q.task_done()

def schedule(q_pile, q):
    while True:
        if not q_pile.empty():
            print("Items on pile: {0}".format(q_pile.qsize()))
            q.put(q_pile.get())
            q_pile.task_done()
        time.sleep(rate)

def main():

    q_pile = PriorityQueue()
    q = Queue()

    for i in range(5):
        t = Thread(target=worker, args=[q, q_pile, i])
        t.daemon = True
        t.start()

    t_schedule = Thread(target=schedule, args=[q_pile, q])
    t_schedule.daemon = True
    t_schedule.start()

    [q_pile.put((-1 * time.time(), i)) for i in range(10)]
    q_pile.join()
    q.join()

if __name__ == '__main__':
    main()

Answer 1

It seems weird to me to try and limit the rate across multiple threads. 尝试限制多个线程之间的速率对我来说似乎很奇怪。 If you limit each thread independently you can avoid all the locking nonsense. 如果您单独限制每个线程，则可以避免所有的锁定问题。

Just a guess, but I think you want to set last_time[0] to time.time() (not current_time ) after the sleep . 只是一种猜测，但我想你想设置last_time[0]到time.time()未current_time后） sleep 。

Answer 2

I get mostly 2 calls at the same time (1 for each thread), followed by a one second pause. 我几乎同时接到2个电话（每个线程1个），然后停一秒钟。 What is wrong? 怎么了？

That's exactly what you should expect from your implementation. 这正是您应该从实现中获得的期望。 Lets say the time t starts at 0 and the rate is 1: 假设时间t从0开始，比率为1：

Thread1 does this: Thread1这样做：

    lock.acquire() # both threads wait here, one gets the lock
    current_time = time.time() # we start at t=0
    interval = current_time - last_time[0] # so interval = 0
    last_time[0] = current_time # last_time = t = 0
    if interval < rate: # rate = 1 so we sleep
        time.sleep(rate - interval) # to t=1
    lock.release() # now the other thread wakes up
    # it's t=1 and we do the job

Thread2 does this: Thread2这样做：

    lock.acquire() # we get the lock at t=1 
    current_time = time.time() # still t=1
    interval = current_time - last_time[0] # interval = 1
    last_time[0] = current_time
    if interval < rate: # interval = rate = 1 so we don't sleep
        time.sleep(rate - interval)
    lock.release() 
    # both threads start the work around t=1

My advice is to limit the speed at which the items are put into the queue . 我的建议是限制将物品放入队列的速度 。

限制Python线程的并发和速率

问题描述

2 个解决方案

解决方案1
1 2011-09-28 17:03:36

解决方案2
1 已采纳 2011-09-28 17:21:18

限制Python线程的并发和速率

问题描述

2 个解决方案

解决方案1 1 2011-09-28 17:03:36

解决方案2 1 已采纳 2011-09-28 17:21:18

解决方案1
1 2011-09-28 17:03:36

解决方案2
1 已采纳 2011-09-28 17:21:18