简体   繁体   English

消息队列:接收错误

[英]Message queues : Bug with receiving

I'm programming a piece of software that uses message queues. 我正在编写一个使用消息队列的软件。 I have a problem with it: 我对此有疑问:

The main process creates 16 sons (with fork) and each son write a message for the next son. 主过程创建16个儿子(带叉子),每个儿子为下一个儿子写一条消息。 Then, they're waiting to receive their message. 然后,他们正在等待接收其消息。 (The son "0" sends a message to the son "1", ..., the son "15" sends a message to the son "0"). (儿子“ 0”向儿子“ 1”发送消息,...,儿子“ 15”向儿子“ 0”发送消息)。

It works well most of the time but sometimes, something weird happens... A process is never receiving it's message despite it was send by the corresponding son ! 它在大多数情况下都能正常工作,但有时会发生一些奇怪的事情……尽管某个进程是由相应的儿子发送的,但它从未收到过该消息! I would say that it occurs once after 10 successes. 我会说,它在10次成功之后发生一次。

I've been able to write a piece of code that has the bug: 我已经能够编写出包含该错误的代码:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <termios.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

struct buf
{
    long mtype;
    int data[32];
};

int main(int arc, char** argv)
{
    int son = 0;
    int pid = 0;
    struct buf msgbuf;

    key_t key;
    key = ftok(argv[0], 'O');

    int qid = msgget(key, IPC_CREAT | 0666);
    if(qid < 0)
    {
        printf("Error\n");
        return -1;
    }

    //Creates 16 sons
    for(int i = 0; i < 16; i++)
    {
        pid = i;
        son = fork();
        if(son == 0)
            break;
    }

    if(son == 0)
    {
        msgbuf.mtype = ((pid + 1) % 16) + 1;
        for(int i = 0; i < 32; i++)
            msgbuf.data[i] = pid;
        printf("Writing %d\n", ((pid + 1) % 16) + 1);
        msgsnd(qid, &msgbuf, 32 * sizeof(int), IPC_NOWAIT);
        printf("Waiting for %d\n", pid + 1);
        msgrcv(qid, &msgbuf, 32 * sizeof(int), pid + 1, 0);
        printf("Got %d\n", (int)msgbuf.mtype);
    }

    sleep(3);
    printf("----- END -----\n");

    msgctl(qid, IPC_RMID, NULL);

    return 0;
}

So, the expected behavior is something like that: 因此,预期的行为是这样的:

Writing 2
Writing 3
Waiting for 1
Waiting for 2
Got 2
Writing 4
Waiting for 3
Got 3
Writing 5
Waiting for 4
Got 4
Writing 6
Waiting for 5
Got 5
Writing 7
Waiting for 6
Got 6
Writing 8
Waiting for 7
Got 7
Writing 9
Waiting for 8
Got 8
Writing 10
Waiting for 9
Got 9
Writing 11
Waiting for 10
Got 10
Writing 12
Waiting for 11
Got 11
Writing 13
Waiting for 12
Got 12
Writing 14
Waiting for 13
Got 13
Writing 15
Waiting for 14
Got 14
Writing 16
Waiting for 15
Got 15
Writing 1
Waiting for 16
Got 16
Got 1
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----

But sometimes, i've got something like that: 但是有时候,我有这样的事情:

Writing 2
Writing 3
Waiting for 1
Waiting for 2
Got 2
Writing 4
Waiting for 3
Got 3
Writing 5
Waiting for 4
Got 4
Writing 6
Waiting for 5
Got 5
Writing 7
Waiting for 6
Got 6
Writing 9
Waiting for 8
Writing 8
Waiting for 7
Got 7
Got 8
Writing 10
Waiting for 9
Got 9
Writing 11
Waiting for 10
Got 10
Writing 12
Waiting for 11
Got 11
Writing 13
Writing 14
Waiting for 12
Waiting for 13
Got 12
Writing 15
Waiting for 14
Got 14
Writing 16
Waiting for 15
Got 15
Writing 1
Waiting for 16
Got 16
Got 1
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
Got 14
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----

As you can see, the message "14" is never received and after 3 seconds, the code frees the queue causes a fake "Got 14". 如您所见,消息“ 14”从未收到,并且3秒钟后,代码释放了队列,从而导致假的“ Got 14”。

In my real code, I use semaphores to be sure that the program only exits after everybody receives his message. 在我的真实代码中,我使用信号量来确保该程序仅在每个人收到他的消息后才退出。 It implies that a deadlock occurs. 这意味着发生死锁。 Indeed, the message is never received, the semaphore is never "unlocked". 实际上,永远不会收到消息,信号量也永远不会“解锁”。 So this is NOT because of the sleep time is too short or something like that. 所以这不是因为睡眠时间太短或类似的原因。 This is NOT because I delete the queue afterwards either. 这不是因为我之后也删除了队列。

But don't forget that most of the time, this is OK! 但是请不要忘记大部分时间,这还可以! I don't understand why sometime a son never gets his message. 我不明白为什么有时候儿子永远不会收到他的信息。

Can you help me? 你能帮助我吗?

First, a friendly pedantic nitpick regarding terminology: a forked process is customarily referred to by the more gender-neutral "child" rather than "son". 首先,关于术语的友好的学究顽固:一个分叉的过程通常由性别中立的“孩子”而不是“儿子”指代。 :-) :-)

Next, do you intentionally want to delay all of your worker children by 3 seconds before they exit? 接下来,您是否有意将所有工人子女推迟3秒钟才退出? Because that's what the code currently does. 因为那是代码当前所做的。 All of the processes have to execute that sleep(3) before exiting. 所有进程必须在退出前执行sleep(3) When testing your code, I rewrote that block as: 在测试您的代码时,我将该代码块重写为:

if (son > 0)
{
    sleep(1);
    printf("main program exiting\n");
}
else
{
    printf("(%d) ----- END -----\n", pid);
}

I think you are misinterpreting the results in that second output block. 我认为您在第二个输出块中误解了结果。 I'm theorizing that there might be some timing/buffering problems regarding your output, which can happen when multiple processes are trying to write to stdout simultaneously. 我的理论是关于您的输出可能存在一些时序/缓冲问题,当多个进程试图同时向stdout写入数据时可能会发生。

Can I ask what you're hoping to accomplish with this message queue? 我可以问一下您希望通过此消息队列完成什么吗? It seems like you're trying to arrange an assembly line of worker processes using the queue, which is not how these data structures are typically used. 似乎您正在尝试使用队列来安排工作进程的组装流水线,而不是通常使用这些数据结构的方式。

I've finally found what happened. 我终于发现发生了什么事。

When I write to the message queue, I do " msgsnd(qid, &msgbuf, 32 * sizeof(int), IPC_NOWAIT); ", the problem seems to be with "IPC_NOWAIT", it appears that sometimes the queue becomes full and the message is not actually written (it is skipped because of the flag "IPC_NOWAIT"). 当我写入消息队列时,我执行“ msgsnd(qid,&msgbuf,32 * sizeof(int),IPC_NOWAIT);”,问题似乎出在“ IPC_NOWAIT”上,似乎有时队列已满并且消息实际未写入(由于标志“ IPC_NOWAIT”而被跳过)。

Without this flag, this is OK. 没有此标志,就可以了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM