简体   繁体   中英

Python Multiprocessing: The child process finished but did not join

I try to implement a multiprocessing code for generating some dictionary interested.

Here is my logic:

from multiprocessing import Manager, Queue, Process

input_list = Queue()
for x in my_input:  # my_input is my input data
  input_list.put(x)

output = Manager().dict()

def job():
  while input_list.qsize()>0:
    x = input_list.get()
    result = my_func(x)  # Do something here
    output[x] = result

def monitor():
  while True:
    if input_list.qsize() > 0:
      time.sleep(1)
      print("Item List is Empty")
      print("Does all the result being save?", len(output.keys()) == len(my_input))

job_list = [Process(target=monitor)]
for _ in range(num_of_worker):
  job_list.append(Process(target=job))
for j in job_list:
  j.start()
for j in job_list:
  j.join()

print("The script is finished")

The logic of my code is quite simple.

  • Initialize a queue and put my input in.
  • Define two functions, job (doing something and save it to a dict) and monitor (print when everything inside queue is being processed and print how many results are being saved).
  • Then standard multiprocessing start and join .

The output I am getting:

Item List is Empty
Does all the result being save? True
...

Some child process did not finish and not yet join. The script is stuck here and did not print "The script is finished".

My script will get stuck at the join statement, despite the monitor telling me that everything is finished (by checking number of items left in input_list and number of results stored in output ).

Moreover, this error is not replicable. If I see my script stuck for more than 5 minutes, I will terminate it manually and restart it. I found that the script could finish properly (like 3 out of 10 times).

What could be happening?

Remark: Since I suspect the error is some child process did not join, I tried something with Event . When the monitor found that the input_list is empty and output is completely filled, it will kill all the process. But the script is also stuck at the event triggering. (And same as above, the code does not get stuck every time, it works 3 out of 10 times).

@Homer512 comments gives me insight on wher is the mistake in the code.

switch from

def job():
   while input_list.qsize>0:
      x = input_list.get()
      ...

to

def job():
       while input_list.qsize>0:
          try:
            x = input_list.get(True,5)
            ...
          except Empty:
            return 0

The reason for my script stuck at join because when input_list got only 1 element left, it trigger the while statement of job but only one process can get something from the queue. The other process will just stuck at get without suitable timeout.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM