简体   繁体   中英

Multiprocessing Queue - child processes gets stuck sometimes and does not reap

First of all I apologize if the title is bit weird but i literally could not think of how to put into a single line the problem i am facing.

So I have the following code

import time
from multiprocessing import Process, current_process, Manager
from multiprocessing import JoinableQueue as Queue

# from threading import Thread, current_thread
# from queue import Queue


def checker(q):
    count = 0
    while True:
        if not q.empty():
            data = q.get()
            # print(f'{data} fetched by {current_process().name}')
            # print(f'{data} fetched by {current_thread().name}')
            q.task_done()
            count += 1
        else:
            print('Queue is empty now')
            print(current_process().name, '-----', count)
            # print(current_thread().name, '-----', count)


if __name__ == '__main__':
    t = time.time()
    # m = Manager()
    q = Queue()
    # with open("/tmp/c.txt") as ifile:
    #     for line in ifile:
    #         q.put((line.strip()))
    for i in range(1000):
        q.put(i)
    time.sleep(0.1)
    procs = []
    for _ in range(2):
        p = Process(target=checker, args=(q,), daemon=True)
        # p = Thread(target=checker, args=(q,))
        p.start()
        procs.append(p)
    q.join()
    for p in procs:
        p.join()

Sample outputs

1: When the process just hangs

Queue is empty now
Process-2 ----- 501
output hangs at this point

2: When everything works just fine.

Queue is empty now
Process-1 ----- 515
Queue is empty now
Process-2 ----- 485

Process finished with exit code 0

Now the behavior is intermittent and happens sometimes but not always.

I have tried using Manager.Queue() as well in place of multiprocessing.Queue() but no success and both exhibits same issue.

I tested this with both multiprocessing and multithreading and i get exactly same behavior, with one slight difference that with multithreading the rate of this behavior is much less compared to multiprocessing .

So I think there is something I am missing conceptually or doing wrong, but i am not able to catch it now since I have spent way too much time on this and now my mind is not seeing something which may be very basic.

So any help is appreciated.

I believe you have a race condition in the checker method. You check whether the queue is empty and then dequeue the next task in separate steps. It's usually not a good idea to separate these two kinds of operations without mutual exclusion or locking, because the state of the queue may change between the check and the pop. It may be non-empty, but another process may then dequeue the waiting work before the process which passed the check is able to do so.

However I generally prefer communication over locking whenever possible; it's less error prone and makes one's intentions clearer. In this case, I would send a sentinel value to the worker processes (such as None ) to indicate that all work is done. Each worker then just dequeues the next object (which is always thread-safe), and, if the object is None , the sub-process exits.

The example code below is a simplified version of your program, and should work without races:

def checker(q):
    while True:
        data = q.get()
        if data is None:
            print(f'process f{current_process().name} ending')
            return
        else:
            pass # do work

if __name__ == '__main__':
    q = Queue()
    for i in range(1000):
        q.put(i)
    procs = []
    for _ in range(2):
        q.put(None) # Sentinel value
        p = Process(target=checker, args=(q,), daemon=True)
        p.start()
        procs.append(p)
    for proc in procs:
        proc.join()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM