简体   繁体   中英

Zombie state multiprocessing library python3

My question concerns a replacement of join() function to avoid a defunct or zombie state of already terminated processes when using the multiprocessing library of python3. Is there an alternative which may suspend the child processes from being terminated until they get the green light from the main process? This allows them to terminate correctly without going into a zombie state?

I prepared a quick illustration using the following code which launches 20 different processes, the first process takes 10 seconds work of load and all others take 3 seconds work of load:

import os
import sys
import time
import multiprocessing as mp
from multiprocessing import Process

def exe(i):
    print(i)    
    if i == 1:
        time.sleep(10)
    else:
        time.sleep(3)
procs = []
for i in range(1,20):
    proc = Process(target=exe, args=(i,))
    proc.start()
    procs.append(proc)

for proc in procs:
    print(proc) # <-- I'm blocked to join others till the first process finishes its work load
    proc.join()

print("finished")

If you launch the script, you will see that all the other processes go to into a zombie state until the join() function is released from the first process. This could make the system unstable or overloaded!

Thanks

Per this thread , Marko Rauhamaa writes:

If you don't care to know when child processes exit, you can simply ignore the SIGCHLD signal:

 import signal signal.signal(signal.SIGCHLD, signal.SIG_IGN)

That will prevent zombies from appearing.

The wait(2) man page explains:

POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see sigaction(2)), then children that terminate do not become zombies and a call to wait() or waitpid() will block until all children have terminated, and then fail with errno set to ECHILD. (The original POSIX standard left the behavior of setting SIGCHLD to SIG_IGN unspecified. Note that even though the default disposition of SIGCHLD is "ignore", explicitly setting the disposition to SIG_IGN results in different treatment of zombie process children.)

Linux 2.6 conforms to the POSIX requirements. However, Linux 2.4 (and earlier) does not: if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored, that is, the call blocks until the next child terminates and then returns the process ID and status of that child.

So if you are using Linux 2.6 or a POSIX-compliant OS, using the above code will allow children processes to exit without becoming zombies. If you are not using a POSIX-compliant OS, then the thread above offers a number of options. Below is one alternative, somewhat similar to Marko Rauhamaa's third suggestion .


If for some reason you need to know when children processes exit and wish to handle (at least some of them) differently, then you could set up a queue to allow the child processes to signal the main process when they are done. Then the main process can call the appropriate join in the order in which it receives items from the queue:

import time
import multiprocessing as mp

def exe(i, q):
    try:
        print(i)    
        if i == 1:
            time.sleep(10)
        elif i == 10:
            raise Exception('I quit')
        else:
            time.sleep(3)
    finally:
        q.put(mp.current_process().name)

if __name__ == '__main__':
    procs = dict()
    q = mp.Queue()
    for i in range(1,20):
        proc = mp.Process(target=exe, args=(i, q))
        proc.start()
        procs[proc.name] = proc

    while procs:
        name = q.get()
        proc = procs[name]
        print(proc) 
        proc.join()
        del procs[name]

    print("finished")

yields a result like

...    
<Process(Process-10, stopped[1])>  # <-- process with exception still gets joined
19
<Process(Process-2, started)>
<Process(Process-4, stopped)>
<Process(Process-6, started)>
<Process(Process-5, stopped)>
<Process(Process-3, stopped)>
<Process(Process-9, started)>
<Process(Process-7, stopped)>
<Process(Process-8, started)>
<Process(Process-13, started)>
<Process(Process-12, stopped)>
<Process(Process-11, stopped)>
<Process(Process-16, started)>
<Process(Process-15, stopped)>
<Process(Process-17, stopped)>
<Process(Process-14, stopped)>
<Process(Process-18, started)>
<Process(Process-19, stopped)>
<Process(Process-1, started)>      # <-- Process-1 ends last
finished

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM