简体   繁体   中英

Programmatically check for zombie child process in Linux using C

I have written a simple C program in RedHat Linux which waits for a child process using waitpid after calling execv.

int main( int argc, char * argv[] )
{
    int pid;
    int status = 0;
    int wait_ret;

    const char * process_path = argv[1];

    if ( argc < 2 )
    {
        exit( EXIT_FAILURE );
    }

    pid = fork(); //spawn child process

    if ( 0 == pid ) //child
    {
        int ret = execv( process_path, &argv[1] );

        if ( ret )
        {
            printf( "execv failed: %s\n", strerror( errno ) );
        }

        exit( EXIT_SUCCESS );
    }

    //wait for the child to terminate
    wait_ret = waitpid( pid, &status, WUNTRACED );

    if ( -1 == wait_ret )
    {
        printf( "ERROR: Failed to wait for process termination\n" );
        exit( EXIT_FAILURE );
    }

    // ... handlers for child exit status ...

    return 0;
}

I am using this as a simple watchdog for some processes I am runnning.

My problem is that one process in particular is not being reaped by waitpid upon exiting and instead remains forever in a Zombie state while waitpid is hung. I am not sure why waitpid is unable to reap this process once it becomes a Zombie (maybe a leaked file descriptor or something).

I could use the WNOHANG flag and poll the child's stat proc file to check for the Zombie state but I would prefer a more elegant solution. Maybe there is some function that I could use to get the Zombie status from without polling this file?

Does anyone know an alternative to waitpid which WILL return when the process becomes a Zombie?

Additional Information:

The child process is being closed by a call to exit( EXIT_FAILURE); in one of its threads.

cat /proc/<CHILD_PID>/stat (before exit):

1037 (my_program) S 1035 58 58 0 -1 4194560 1309 0 22 0 445 1749 0 0 20 0 13 0 4399 22347776 1136 4294967295 3336716288 3338455332 3472776112 3472775232 3335760920 0 0 4 31850 4294967295 0 0 17 0 0 0 26 0 0 3338489412 3338507560 3338600448

cat /proc/<CHILD_PID>/stat (after exit):

1037 (my_program) Z 1035 58 58 0 -1 4227340 1316 0 22 0 464 1834 0 0 20 0 2 0 4399 0 0 4294967295 0 0 0 0 0 0 0 4 31850 4294967295 0 0 17 0 0 0 26 0 0 0 0 0

Note that the child PID is 1037 and the parent PID is 1035 in this case.

My problem is that one process in particular is not being reaped by waitpid upon exiting and instead remains forever in a Zombie state while waitpid is hung ? If I understand correctly, you don't want child to become zombie then Use SA_NOCLDWAIT flag. From the manual page of sigaction()

SA_NOCLDWAIT (since Linux 2.6) If signum is SIGCHLD, do not transform children into zombies when they terminate. See also waitpid(2). This flag is meaningful only when establishing a handler for SIGCHLD, or when setting that signal's disposition to SIG_DFL.

 If the SA_NOCLDWAIT flag is set when establishing a handler for SIGCHLD, POSIX.1 leaves it unspecified whether a SIGCHLD signal is generated when a child process terminates. On Linux, a SIGCHLD signal is generated in this case; on some other implementations, it is not.

Idea is when child process completes first, parent receives signal no 17 or SIGCHLD & child process will become zombie as parent still running. So how to remove child ASAP it becomes zombie, solution is use flags SA_NOCLDWAIT .

Here is the sample code

void my_isr(int n) {
        /* error handling */
}
int main(void) {
        if(fork()==0) { /* child process */
                printf("In child process ..c_pid: %d and p_pid : %d\n",getpid(),getppid());
                sleep(5);
                printf("sleep over .. now exiting \n");
        }
        else { /*parent process */
                struct sigaction v;
                v.sa_handler=my_isr;/* SET THE HANDLER TO ISR */
                v.sa_flags=SA_NOCLDWAIT; /* it will not let child to become zombie */
                sigemptyset(&v.sa_mask);
                sigaction(17,&v,NULL);/* when parent receives SIGCHLD, IT GETS CALLED */
                while(1); /*for observation purpose, to make parent process alive */
        }
        return 0;
}

Just comment/uncomment the v.sa_flags=SA_NOCLDWAIT; line & analyze the behavior by running a.out in one terminal & check ps -el | grep pts/0 ps -el | grep pts/0 in another terminal.

Does anyone know an alternative to waitpid which WILL return when the process becomes a Zombie ? use WNOHANG as you did & told in manual page of waitpid()

WUNTRACED also return if a child has stopped (but not traced via ptrace(2)). Status for traced children which have stopped is provided even if this option is not specified.

Any process that terminates becomes a zombie until it is collected by a wait call. Here the wait does not seem to happen in all cases.

From the code given I can't figure out why the wait does not happen and the process remains a zombie. (not without running it anyway)

But instead of waiting on a specific pid only, you can wait on any child by using -1 as the first argument to waitpid . Don't use WNOHANG , as it require busy polling (don't do that).

You may also want to drop WUNTRACED unless you have a specific reason to include it. But there is no harm in dropping it and see what difference it makes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM