简体   繁体   English

python multiprocessing共享队列重新排序

[英]python multiprocessing shared queue re-ordering

I have a server and several clients. 我有一台服务器和几个客户端。 They all share a task and results multiprocessing.Queue. 它们都共享一个任务,并且结果为multiprocessing.Queue。 However whenever a client finishes a task and puts result on results queue, I want the server to look at the results, and based on that, re-order the tasks queue. 但是,每当客户端完成一项任务并将结果放入结果队列时,我都希望服务器查看结果,然后基于此结果对任务队列重新排序。

This means of course popping everything off the tasks queue and re-adding. 这当然意味着将所有内容从任务队列中弹出并重新添加。 During this re-ordering process, I want the clients to block touching the tasks queue. 在此重新排序过程中,我希望客户端阻止接触任务队列。 My question is how I get the server to recognize when a task is added to the results queue and react by locking the tasks queue and reordering while protecting the queue. 我的问题是我如何让服务器识别何时将任务添加到结果队列中,并通过锁定任务队列并在保护队列的同时重新排序来做出反应。 The invariant is that the server must re-order after every result returned before clients get a new task. 不变的是,服务器必须在返回每个结果之后重新排序,然后客户端才能获得新任务。

I suppose a simple (but wrong) way would be to have a multiprocessing.Value act as a boolean and whenever a result is added the client flips that to True, meaning a result has been added. 我想一个简单的(但是错误的)方法是让multiprocessing.Value充当布尔值,并且每当添加结果时,客户端就会将其翻转为True,这意味着已经添加了结果。 The server could poll to get this value but ultimately it could miss another client coming in between polls and adding another result. 服务器可以轮询以获得该值,但最终它可能会错过轮询之间添加另一个结果的另一个客户端。

Any thoughts appreciated. 任何想法表示赞赏。

** The 'multithreading' tag is just because its very similar thought as in threading, I don't think the process/thread distinction here matters much. **'multithreading'标签只是因为它与线程中的思想非常相似,我认为这里的进程/线程区别并不重要。

Let's try some code - some progress is better than none ;-) Part of the problem is to ensure that nothing gets taken from the task queue if the result queue has something in it, right? 让我们尝试一些代码-有些进步总比没有好;-)问题的一部分是确保如果结果队列中有内容,那么什么也不会从任务队列中获取,对吧? So the queues are intimately connected. 因此,队列紧密相连。 This approach puts both queues under the protection of a lock, and uses Conditions to avoid any need for polling: 此方法将两个队列置于锁的保护之下,并使用“条件”来避免进行轮询的任何需要:

Setup, done in server. 设置,在服务器中完成。 taskQ , resultQ , taskCond and resultCond must be passed to the client processes ( lock need not be explicitly passed - it's contained in the Conditions): 必须将taskQresultQtaskCondresultCond传递给客户端进程(不需要显式传递lock -它包含在条件中):

import multiprocessing as mp
taskQ = mp.Queue()
resultQ = mp.Queue()
lock = mp.Lock()
# both conditions share lock
taskCond = mp.Condition(lock)
resultCond = mp.Condition(lock)

Client gets task; 客户得到任务; all clients use this function. 所有客户端都使用此功能。 Note that a task won't be consumed so long as the result queue has something in it: 请注意,只要结果队列中包含某些内容,就不会使用该任务:

def get_task():
    taskCond.acquire()
    while taskQ.qsize() == 0 or resultQ.qsize():
        taskCond.wait()
    # resultQ is empty and taskQ has something
    task = taskQ.get()
    taskCond.release()
    return task

Client has result: 客户有结果:

with resultCond:
    resultQ.put(result)
    # only the server waits on resultCond
    resultCond.notify()

Server loop: 服务器循环:

resultCond.acquire()
while True:
    while resultQ.qsize() == 0:
        resultCond.wait()
    # operations on both queues in all clients are blocked now
    # ... drain resultQ, reorder taskQ ...
    taskCond.notify_all()

Notes: 笔记:

  1. qsize() is usually probabilistic, but because all queue operations are done while the lock is held, it's reliable in this context. qsize()通常是概率性的,但是由于所有队列操作都是在持有锁的同时完成的,因此在这种情况下它是可靠的。

  2. In fact, because all queue operations are protected by our own lock here, there's really no need to use mp.Queue s. 实际上,由于所有队列操作均受此处自己的锁保护,因此实际上无需使用mp.Queue For example, an mp.Manager().list() would work too (any shared structure). 例如, mp.Manager().list()也可以工作(任何共享结构)。 Perhaps a list would be easier to work with when you're rearranging tasks? 也许当您重新安排任务时,列表会更容易使用?

  3. One part I don't like much: when the server does taskCond.notify_all() , some clients may be waiting to get a new task, while others may be waiting to return a new result. 我不喜欢的一部分:当服务器执行taskCond.notify_all() ,某些客户端可能正在等待获取新任务,而另一些客户端可能正在等待返回新结果。 They may run in any order. 它们可以以任何顺序运行。 As soon as any client waiting to return a result gets a chance, all clients waiting to get a task will block, but before then tasks will be consumed. 任何等待返回结果的客户端都有机会,所有等待获取任务的客户端都会阻塞,但是在此之前,任务将被消耗。 "The problem" here, of course, is that we have no idea a new result is waiting before something is actually added to the result queue. 当然,这里的“问题”是我们不知道有什么新结果要等到实际添加到结果队列中。

For the last one, perhaps changing the "client has result" code to: 对于最后一个,也许将“客户有结果”代码更改为:

resultQ.put(result)
with resultCond:
    resultCond.notify()

would be better. 会更好。 Unsure. 不确定。 It does make it significantly harder to reason about, because it's then no longer true that all queue operations are done under the protection of our lock. 这确实使推理变得非常困难,因为所有队列操作都是在锁的保护下完成的,这不再是事实。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM