简体   繁体   English

如何在 python 中正确实现生产者消费者

[英]How to properly implement producer consumer in python

I have two threads in a producer consumer pattern.我在生产者消费者模式中有两个线程。 When the consumer receives data it calls an time consuming function expensive() and then enters in a for loop.当消费者收到数据时,它会调用一个耗时的 function expensive() ,然后进入一个for循环。

But if while the consumer is working new data arrives, it should abort the current work, (exit the loop) and start with the new data.但是如果消费者在工作时有新数据到达,它应该中止当前工作,(退出循环)并从新数据开始。

I tried with a queue.Queue something like this:我尝试使用 queue.Queue ,如下所示:

q = queue.Queue()

def producer():
    while True:
        ...
        q.put(d)
      
def consumer():
    while True:
        d = q.get()
        expensive(d)
        for i in range(10000):
            ...
            if not q.empty():
                break
    

But the problem with this code is that if the producer put data too too fast, and the queue get to have many items, the consumer will do the expensive(d) call plus one loop iteration and then abort for each item, which is time consuming.但是这段代码的问题是,如果生产者放入数据的速度太快,并且队列有很多项目,消费者将执行expensive(d)调用加上一个循环迭代,然后为每个项目中止,这是时间消耗。 The code should work, but is not optimized.代码应该可以工作,但没有优化。

Without modifying the code in expensive one solution could be to run it as a separate process which will provide you the ability to terminateit prematurely.在不修改expensive的代码的情况下,一种解决方案可能是将其作为一个单独的进程运行,这将使您能够提前终止它。 Since there's no mention to how long expensive runs this may or may not be more time efficient, however.但是,由于没有提及expensive运行时间,这可能会或可能不会更省时。

import multiprocessing as mp

q = queue.Queue()


def producer():
    while True:
        ...
        q.put(d)
  
def consumer():
    while True:
        d = q.get()
        exp = mp.Thread(target=expensive, args=(d,))
        for i in range(10000):
            ...
            if not q.empty():
                exp.terminate() # or exp.kill()
                break

Well, one way is to use a queue design that can keep an internal lists of waiting and working threads.好吧,一种方法是使用队列设计,该设计可以保留等待和工作线程的内部列表。 You can then create several consumer threads to wait on the queue and, when work arrives, set a known consumer thread to do the work.然后,您可以创建多个消费者线程来等待队列,当工作到达时,设置一个已知的消费者线程来完成工作。 When the thread has finished, it calls into the queue to remove itself from the working list and add itself to the waiting list.当线程完成后,它会调用队列以将自己从工作列表中删除并将自己添加到等待列表中。

The consumer threads each have an 'abort' atomic that can signal the thread to finish early.每个消费者线程都有一个“中止”原子,可以指示线程提前完成。 There will be some latency while the thread performs inner loops, but that will not matter....线程执行内部循环时会有一些延迟,但这没关系....

If new work arrives at the queue from the producer, and the working queue is not empty, the 'abort' bool of the working thread/s can be set and their priority set to the minimum possible.如果新工作从生产者到达队列,并且工作队列不为空,则可以设置工作线程的 'abort' bool 并将其优先级设置为尽可能低。 The new work can then be dispatched onto one of the waiting threads from the pool, so setting it working.然后可以将新工作分派到池中的一个等待线程上,从而将其设置为工作。

The waiting threads will need a 'start' function that signals an event/sema/condvar that the wait thread..well..waits on.等待线程将需要一个“开始” function,它发出等待线程..well..等待的事件/sema/condvar 信号。 That allows the producer that supplied work to set that specific thread running, rather than the 'usual' practice where any thread from a pool may pick up work.这允许提供工作的生产者设置该特定线程运行,而不是像池中的任何线程都可以获取工作的“通常”做法。

Such a design allows new work to be started 'immediately', makes the previous work thread irrelevant by de-prioritizing it and avoids the overheads of thread/process termination.这样的设计允许“立即”启动新工作,通过取消优先级使之前的工作线程无关紧要,并避免线程/进程终止的开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM