简体   繁体   English

多处理队列-子进程有时会卡住并且不会收获

[英]Multiprocessing Queue - child processes gets stuck sometimes and does not reap

First of all I apologize if the title is bit weird but i literally could not think of how to put into a single line the problem i am facing. 首先,如果标题有点怪异,我会道歉,但是我真的没想到如何将我面临的问题放在一行中。

So I have the following code 所以我有以下代码

import time
from multiprocessing import Process, current_process, Manager
from multiprocessing import JoinableQueue as Queue

# from threading import Thread, current_thread
# from queue import Queue


def checker(q):
    count = 0
    while True:
        if not q.empty():
            data = q.get()
            # print(f'{data} fetched by {current_process().name}')
            # print(f'{data} fetched by {current_thread().name}')
            q.task_done()
            count += 1
        else:
            print('Queue is empty now')
            print(current_process().name, '-----', count)
            # print(current_thread().name, '-----', count)


if __name__ == '__main__':
    t = time.time()
    # m = Manager()
    q = Queue()
    # with open("/tmp/c.txt") as ifile:
    #     for line in ifile:
    #         q.put((line.strip()))
    for i in range(1000):
        q.put(i)
    time.sleep(0.1)
    procs = []
    for _ in range(2):
        p = Process(target=checker, args=(q,), daemon=True)
        # p = Thread(target=checker, args=(q,))
        p.start()
        procs.append(p)
    q.join()
    for p in procs:
        p.join()

Sample outputs 样本输出

1: When the process just hangs 1:进程挂起时

Queue is empty now
Process-2 ----- 501
output hangs at this point

2: When everything works just fine. 2:当一切正常时。

Queue is empty now
Process-1 ----- 515
Queue is empty now
Process-2 ----- 485

Process finished with exit code 0

Now the behavior is intermittent and happens sometimes but not always. 现在该行为是间歇性的,有时但并非总是发生。

I have tried using Manager.Queue() as well in place of multiprocessing.Queue() but no success and both exhibits same issue. 我尝试过使用Manager.Queue()代替multiprocessing.Queue()但是没有成功,而且都出现相同的问题。

I tested this with both multiprocessing and multithreading and i get exactly same behavior, with one slight difference that with multithreading the rate of this behavior is much less compared to multiprocessing . 我在multiprocessingmultithreading都对此进行了测试,并且我得到了完全相同的行为,只是略有不同,即与multithreading multiprocessing相比,在multiprocessing multithreading中这种行为的发生率要低得多。

So I think there is something I am missing conceptually or doing wrong, but i am not able to catch it now since I have spent way too much time on this and now my mind is not seeing something which may be very basic. 因此,我认为我在概念上缺少某些东西或做错了什么,但是由于我花了太多时间在此上,现在我无法抓住它,现在我的想法是看不到任何可能非常基本的东西。

So any help is appreciated. 因此,任何帮助表示赞赏。

I believe you have a race condition in the checker method. 我相信您在checker方法中存在比赛条件。 You check whether the queue is empty and then dequeue the next task in separate steps. 您检查队列是否为空,然后在单独的步骤中使下一个任务出队。 It's usually not a good idea to separate these two kinds of operations without mutual exclusion or locking, because the state of the queue may change between the check and the pop. 在没有互斥或锁定的情况下,将这两种操作分开通常不是一个好主意,因为队列的状态可能在检查和弹出之间改变。 It may be non-empty, but another process may then dequeue the waiting work before the process which passed the check is able to do so. 它可能是非空的,但是另一个进程可能会在通过检查的进程能够这样做之前使等待的工作出队。

However I generally prefer communication over locking whenever possible; 但是,我通常更喜欢通讯而不是锁定。 it's less error prone and makes one's intentions clearer. 它不易出错,使意图更清晰。 In this case, I would send a sentinel value to the worker processes (such as None ) to indicate that all work is done. 在这种情况下,我将向工作进程发送一个哨兵值(例如None )以指示所有工作都已完成。 Each worker then just dequeues the next object (which is always thread-safe), and, if the object is None , the sub-process exits. 然后,每个工作程序仅使下一个对象(始终是线程安全的)出队,并且,如果该对象为None ,则子进程退出。

The example code below is a simplified version of your program, and should work without races: 下面的示例代码是您程序的简化版本,应该可以正常运行:

def checker(q):
    while True:
        data = q.get()
        if data is None:
            print(f'process f{current_process().name} ending')
            return
        else:
            pass # do work

if __name__ == '__main__':
    q = Queue()
    for i in range(1000):
        q.put(i)
    procs = []
    for _ in range(2):
        q.put(None) # Sentinel value
        p = Process(target=checker, args=(q,), daemon=True)
        p.start()
        procs.append(p)
    for proc in procs:
        proc.join()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM