简体   繁体   中英

Linux: fork & execv, wait for child process hangs

I wrote a helper function to start a process using fork() and execv() inspired by this answer . It is used to start eg mysqldump to make a database backup. The code works totally fine in a couple of different locations with different programs.

Now I hit one constellation where it fails: It is a call to systemctl to stop a unit. Running systemctl works, the unit is stopped. But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends. If I check, if the worker process finished with kill(), I can tell that it did.

Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process! Is there anything in my code (see below) that is incorrect that could trigger that behavior? I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.

What's strange: Deep, deep, deep in the program JSON-RPC is used. If I deactivate the code using the JSON-RPC everything works fine!?

Environment: The program that uses the function is a multi-threaded application. Signals are blocked for all threads. The main threads handles signals via sigtimedwait().

Code (production code in which logging got traded for output via std::cout) with sample main function:

#include <iostream>

#include <unistd.h>
#include <sys/wait.h>

namespace {

bool checkStatus(const int status) {
    return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}

}

bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
    auto result = true;

    const pid_t intermediatePid = fork();
    if(intermediatePid == 0) {
        // intermediate process
        std::cout << "Intermediate process: Started (" <<  getpid() << ")." << std::endl;
        const pid_t workerPid = fork();
        if(workerPid == 0) {
            // worker process
            if(fileDescriptor) {
                std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
                const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
                if(-1 == dupResult) {
                    std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
                    _exit(EXIT_FAILURE);
                }
            }
            execv(path, const_cast<char**>(argv));

            std::cout << "Intermediate process: Worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        } else if(-1 == workerPid) {
            std::cout << "Intermediate process: Starting worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        }

        const pid_t timeoutPid = fork();
        if(timeoutPid == 0) {
            // timeout process
            std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
            sleep(timeoutInSeconds);
            std::cout << "Timeout process: Finished." << std::endl;
            _exit(EXIT_SUCCESS);
        } else if(-1 == timeoutPid) {
            std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition never evaluated to true in my tests.
        const auto killResult = kill(workerPid, 0);
        if((-1 == killResult) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Waiting for child processes." << std::endl;
        int status = -1;
        const pid_t exitedPid = wait(&status);

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition evaluates to true in the case of an error.
        const auto killResult2 = kill(workerPid, 0);
        if((-1 == killResult2) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Child process finished. Status: " <<  status << "." << std::endl;
        if(exitedPid == workerPid) {
            std::cout << "Intermediate process: Killing timeout process." << std::endl;
            kill(timeoutPid, SIGKILL);
        } else {
            std::cout << "Intermediate process: Killing worker process." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
            wait(nullptr);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }
        std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
        wait(nullptr);
        std::cout << "Intermediate process: Finished." << std::endl;
        _exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);

    } else if(-1 == intermediatePid) {
        // error
        std::cout << "Parent process: Error starting intermediate process!" << std::endl;
        result = false;
    } else {
        // parent process
        std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
        processId = intermediatePid;
    }

    return(result);
}

bool waitForProcess(const pid_t processId) {
    int status = 0;
    const auto waitResult = waitpid(processId, &status, 0);
    auto result = false;
    if(waitResult == processId) {
        result = checkStatus(status);
    }
    return(result);
}

int main() {
    pid_t pid = 0;
    const char* const path = "/bin/ls";
    const char* argv[] = { "/bin/ls", "--help", nullptr };
    const unsigned int timeoutInS = 5;
    const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
    if(startResult) {
        const auto waitResult = waitForProcess(pid);
        std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
    } else {
        std::cout << "startProcess failed!" << std::endl;
    }
}

Edit

The expected output should contain

  • Intermediate process: Waiting for child processes.
  • Intermediate process: Child process finished. Status: 0.
  • Intermediate process: Killing timeout process.

In the case of error the output looks like this

  • Intermediate process: Waiting for child processes.
  • Intermediate process: Child process finished. Status: -1
  • Intermediate process: Killing worker process.

When you run the sample code you will most likely see the expected output. I cannot reproduce the incorrect result in a simple example.

I found the problem:

Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code

#if !defined(_WIN32) && !defined(__SYMBIAN32__)
  // Ignore SIGPIPE signal, so if browser cancels the request, it
  // won't kill the whole process.
  (void) signal(SIGPIPE, SIG_IGN);
  // Also ignoring SIGCHLD to let the OS to reap zombies properly.
  (void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32

(void) signal(SIGCHLD, SIG_IGN);

causes that

if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."

as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD .

This is also described in the man page for WAIT(2)

ERRORS [...]

ECHILD [...] (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the Linux Notes section about threads.)

Stupid on my part not to check the return value correctly. Before trying

if(exitedPid == workerPid) {

I should have checked that exitedPid is != -1 .

If I do so errno gives me ECHILD . If I would have known that in the first place, I would have read the man page and probably found the problem faster...

Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it. Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.

Additional info: The code that caused this problem was changed in mongoose in September 2013 with this commit .

In our application the similar issue we faced. in a intense situation of repeated child process forks(), the child process never returned. One can monitor the PID of the child process, and if it does not return beyond a particular application defined threshold, you can terminate that process by sending a kill/Term signal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM