简体   繁体   English

为什么我的分叉进程上的信号量没有被释放?

[英]Why is my semaphore on a forked process not being released?

I have a problem with a POSIX semaphore being used to release a forked process.我在使用 POSIX 信号量来释放分叉进程时遇到问题。 The forked process is started by calling another instance of the running process after a fork and exec .通过在forkexec之后调用正在运行的进程的另一个实例来启动分叉进程。 Sometimes the child is being released and other times it is not.有时孩子会被释放,有时则不会。

It is a POSIX shared memory named semaphore and the weird thing is it works sometimes.它是一个名为信号量的 POSIX 共享 memory,奇怪的是它有时会起作用。 I checked the other solutions out there and their solutions did not help me.我检查了那里的其他解决方案,他们的解决方案对我没有帮助。

void init()
{
    ...
    sem_unlink(sem_name.c_str());

    if (parent_process)
    {
        sem_t* semaphore = sem_open(sem_name.c_str(), O_CREAT | O_RDWR, 0);
        if (SEM_FAILED == semaphore)
        {
            display_error();
        }
        sem_close(semaphore);
    }

    child_pid = fork();

    if (child_pid == -1)
    {
        display_error();
    }
    else if (child_pid == 0)
    {
        int ret = execve(program_name, args, env);
        if (ret == -1)
        {
            display_error();
        }
    }
    else
    {
        // rest of code
    }
    ...
}

I had the child process wait to be released in another class that has this function:我让子进程等待在另一个具有此 function 的 class 中释放:

void wait_until_released()
{
    if (!parent_process)
    {
        sem_t* semaphore = sem_open(sem_name.c_str(), O_CREAT | O_RDWR, 0);
        if (SEM_FAILED == semaphore)
        {
            display_error();
        }

        sem_wait(semaphore);

        sem_close(semaphore);
        sem_unlink(semaphore);            
    }
}

The post was done in another location in the code:该帖子是在代码中的另一个位置完成的:

void release_child()
{
    sem_t* semaphore = sem_open(sem_name.c_str(), O_CREAT | O_RDWR, 0);
    if (SEM_FAILED == semaphore)
    {
        display_error();
    }

    if (sem_post(semaphore) != 0)
    {
        display_error();
    }

    sem_close(semaphore);
    sem_unlink(semaphore);
}

This problem was occurring ultimately because I was calling sem_unlink on the POSIX semaphore before doing a wait on it in the forked process.这个问题最终会发生,因为我在 POSIX 信号量上调用sem_unlink ,然后在分叉进程中对其进行等待。 Calling sem_unlink causes the semaphore to be removed when all file descriptors have called sem_close on the semaphore.当所有文件描述符都在信号量上调用sem_close时,调用sem_unlink会导致删除信号量。 This, in essence, prevented my child process from being able to use that instance and be released at all.从本质上讲,这阻止了我的子进程能够使用该实例并完全被释放。

This only works sometimes because there is a base assumption that the child is already waiting to be released by the time we call release_child .这只有时有效,因为有一个基本假设,即在我们调用release_child时孩子已经在等待释放。 This is not guaranteed and was the reason that this was working sometimes and not all the time.这不能保证,这也是有时而不是一直有效的原因。 If we call release_child before the child has called sem_wait then we remove the semaphore completely and the child creates their own version of the semaphore that never gets posted to.如果我们在孩子调用sem_wait之前调用release_child ,那么我们会完全删除信号量,并且孩子会创建自己的信号量版本,并且永远不会被发布到。

By moving the unlink call after the if statement in the first set of code, I prevented the child process from removing the semaphore before waiting on it.通过在第一组代码中的 if 语句之后移动unlink调用,我阻止了子进程在等待信号量之前删除它。 Also, by removing O_CREAT flag from the sem_open in the release_child and wait_until_released functions and the sem_unlink from the release_child function, I prevented the child from creating their own semaphore.此外,通过从release_childwait_until_released函数中的sem_open中删除O_CREAT标志以及从release_child function 中sem_unlink sem_unlink,我阻止了孩子创建自己的信号量。

I wanted to record the behavior that I was seeing though because that was what really caused me problems.我想记录我看到的行为,因为那是真正给我带来问题的原因。 In the middle of debugging and fixing this problem I learned that if the parent creates the semaphore but doesn't close it, the child was calling the sem_unlink and creating it's own version with the same name.在调试和修复这个问题的过程中,我了解到如果父级创建信号量但没有关闭它,则子级调用sem_unlink并创建自己的同名版本。 This caused me to believe that the original semaphore was still there but that the sem_post and/or sem_wait were not working.这使我相信原始信号量仍然存在,但sem_post和/或sem_wait不起作用。

So just be aware of your post, wait, close, and unlink calls when you are doing semaphores.因此,当你在做信号量时,请注意你的 post、wait、close 和 unlink 调用。 Especially when it comes to forked processes!!尤其是在分叉进程方面!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM