简体   繁体   English

Linux:fork & execv,等待子进程挂起

[英]Linux: fork & execv, wait for child process hangs

I wrote a helper function to start a process using fork() and execv() inspired by this answer .我编写了一个辅助函数来使用受此答案启发的 fork() 和 execv() 启动进程。 It is used to start eg mysqldump to make a database backup.它用于启动例如 mysqldump 以进行数据库备份。 The code works totally fine in a couple of different locations with different programs.该代码在具有不同程序的几个不同位置完全正常。

Now I hit one constellation where it fails: It is a call to systemctl to stop a unit.现在我遇到了一个失败的星座:这是对 systemctl 的调用以停止一个单位。 Running systemctl works, the unit is stopped.运行 systemctl 工作,单元停止。 But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends.但是在中间过程中,当为子进程wait()时,wait()会挂起,直到超时过程结束。 If I check, if the worker process finished with kill(), I can tell that it did.如果我检查,如果工作进程以 kill() 结束,我可以判断它确实完成了。

Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process!重要提示:程序没有行为不端或段错误,除了 wait() 没有发出工作进程结束的信号! Is there anything in my code (see below) that is incorrect that could trigger that behavior?我的代码(见下文)中是否有任何可能触发该行为的不正确内容? I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.我已经阅读了Threads 和 fork():在混合它们之前三思而后行,但我在其中找不到与我的问题相关的任何内容。

What's strange: Deep, deep, deep in the program JSON-RPC is used.奇怪的是:deep,deep,deep 在程序中使用了 JSON-RPC。 If I deactivate the code using the JSON-RPC everything works fine!?如果我使用 JSON-RPC 停用代码,一切正常!?

Environment: The program that uses the function is a multi-threaded application.环境:使用该函数的程序是一个多线程应用程序。 Signals are blocked for all threads.所有线程的信号都被阻塞。 The main threads handles signals via sigtimedwait().主线程通过 sigtimedwait() 处理信号。

Code (production code in which logging got traded for output via std::cout) with sample main function:代码(通过 std::cout 将日志记录交换为输出的生产代码)与示例主函数:

#include <iostream>

#include <unistd.h>
#include <sys/wait.h>

namespace {

bool checkStatus(const int status) {
    return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}

}

bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
    auto result = true;

    const pid_t intermediatePid = fork();
    if(intermediatePid == 0) {
        // intermediate process
        std::cout << "Intermediate process: Started (" <<  getpid() << ")." << std::endl;
        const pid_t workerPid = fork();
        if(workerPid == 0) {
            // worker process
            if(fileDescriptor) {
                std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
                const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
                if(-1 == dupResult) {
                    std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
                    _exit(EXIT_FAILURE);
                }
            }
            execv(path, const_cast<char**>(argv));

            std::cout << "Intermediate process: Worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        } else if(-1 == workerPid) {
            std::cout << "Intermediate process: Starting worker failed!" << std::endl;
            _exit(EXIT_FAILURE);
        }

        const pid_t timeoutPid = fork();
        if(timeoutPid == 0) {
            // timeout process
            std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
            sleep(timeoutInSeconds);
            std::cout << "Timeout process: Finished." << std::endl;
            _exit(EXIT_SUCCESS);
        } else if(-1 == timeoutPid) {
            std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition never evaluated to true in my tests.
        const auto killResult = kill(workerPid, 0);
        if((-1 == killResult) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Waiting for child processes." << std::endl;
        int status = -1;
        const pid_t exitedPid = wait(&status);

        // ---------------------------------------
        // This code is only used for double checking if the worker is still running.
        // The if condition evaluates to true in the case of an error.
        const auto killResult2 = kill(workerPid, 0);
        if((-1 == killResult2) && (ESRCH == errno)) {
            std::cout << "Intermediate process: Worker is not running." << std::endl;
        }
        // ---------------------------------------

        std::cout << "Intermediate process: Child process finished. Status: " <<  status << "." << std::endl;
        if(exitedPid == workerPid) {
            std::cout << "Intermediate process: Killing timeout process." << std::endl;
            kill(timeoutPid, SIGKILL);
        } else {
            std::cout << "Intermediate process: Killing worker process." << std::endl;
            kill(workerPid, SIGKILL);
            std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
            wait(nullptr);
            std::cout << "Intermediate process: Finished." << std::endl;
            _exit(EXIT_FAILURE);
        }
        std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
        wait(nullptr);
        std::cout << "Intermediate process: Finished." << std::endl;
        _exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);

    } else if(-1 == intermediatePid) {
        // error
        std::cout << "Parent process: Error starting intermediate process!" << std::endl;
        result = false;
    } else {
        // parent process
        std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
        processId = intermediatePid;
    }

    return(result);
}

bool waitForProcess(const pid_t processId) {
    int status = 0;
    const auto waitResult = waitpid(processId, &status, 0);
    auto result = false;
    if(waitResult == processId) {
        result = checkStatus(status);
    }
    return(result);
}

int main() {
    pid_t pid = 0;
    const char* const path = "/bin/ls";
    const char* argv[] = { "/bin/ls", "--help", nullptr };
    const unsigned int timeoutInS = 5;
    const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
    if(startResult) {
        const auto waitResult = waitForProcess(pid);
        std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
    } else {
        std::cout << "startProcess failed!" << std::endl;
    }
}

Edit编辑

The expected output should contain预期输出应包含

  • Intermediate process: Waiting for child processes.中间进程:等待子进程。
  • Intermediate process: Child process finished.中间进程:子进程完成。 Status: 0.状态:0。
  • Intermediate process: Killing timeout process.中间进程:杀死超时进程。

In the case of error the output looks like this在错误的情况下,输出看起来像这样

  • Intermediate process: Waiting for child processes.中间进程:等待子进程。
  • Intermediate process: Child process finished.中间进程:子进程完成。 Status: -1状态:-1
  • Intermediate process: Killing worker process.中间进程:杀死工作进程。

When you run the sample code you will most likely see the expected output.当您运行示例代码时,您很可能会看到预期的输出。 I cannot reproduce the incorrect result in a simple example.我无法在一个简单的例子中重现错误的结果。

I found the problem:我发现了问题:

Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code在函数mg_start的猫鼬(JSON-RPC 使用猫鼬)源中,我发现了以下代码

#if !defined(_WIN32) && !defined(__SYMBIAN32__)
  // Ignore SIGPIPE signal, so if browser cancels the request, it
  // won't kill the whole process.
  (void) signal(SIGPIPE, SIG_IGN);
  // Also ignoring SIGCHLD to let the OS to reap zombies properly.
  (void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32

(void) signal(SIGCHLD, SIG_IGN);

causes that导致

if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."如果父级执行wait(),则此调用仅在所有子级都退出时返回,然后返回-1,并将errno 设置为ECHILD。”

as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD .提到这里5.5的Voodoo:等待和SIGCHLD。

This is also described in the man page for WAIT(2)这也在 WAIT(2) 的手册页中进行了描述

ERRORS [...]错误 [...]

ECHILD [...] (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the Linux Notes section about threads.) ECHILD [...](如果 SIGCHLD 的操作设置为 SIG_IGN,这可能发生在自己的孩子身上。另请参阅 Linux 注释部分关于线程。)

Stupid on my part not to check the return value correctly.愚蠢的我没有正确检查返回值。 Before trying在尝试之前

if(exitedPid == workerPid) {

I should have checked that exitedPid is != -1 .我应该检查一下exitedPid!= -1

If I do so errno gives me ECHILD .如果我这样做errno给我ECHILD If I would have known that in the first place, I would have read the man page and probably found the problem faster...如果我一开始就知道这一点,我会阅读手册页并可能更快地找到问题......

Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it.顽皮的猫鼬只是为了搞乱信号处理,不管应用程序想要做什么。 Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.此外,当使用 mg_stop 停止时,猫鼬不会恢复信号处理的更改。

Additional info: The code that caused this problem was changed in mongoose in September 2013 with this commit .附加信息:导致此问题的代码已于 2013 年 9 月在 mongoose 中通过此提交进行了更改。

In our application the similar issue we faced.在我们的应用程序中,我们遇到了类似的问题。 in a intense situation of repeated child process forks(), the child process never returned.在重复子进程 forks() 的激烈情况下,子进程永远不会返回。 One can monitor the PID of the child process, and if it does not return beyond a particular application defined threshold, you can terminate that process by sending a kill/Term signal.可以监视子进程的 PID,如果它没有返回超过特定应用程序定义的阈值,您可以通过发送 kill/Term 信号来终止该进程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM