为什么这个多线程程序会出现段错误？

Question

I'm playing around with a solution for the dinning philosophers problem using mutexes, however the program is segfaulting for probably a thread-related bug我正在研究使用互斥体解决哲学家用餐问题的解决方案，但是该程序可能因与线程相关的错误而出现段错误
What i'm trying to do here is basically think of the fork as a mutex and creat a function void *eat(void *arg) , and then close the critical section (the critical section is just the thread declaring its id and that its currently eating) whatever the function gets called, then i loop through all my philosophers and check if its id (id's start from 0) divisible by 2.我在这里要做的基本上是将 fork 视为互斥锁并创建一个 function void *eat(void *arg) ，然后关闭关键部分（关键部分只是声明其 id 的线程，它的当前正在吃）无论 function 被调用什么，然后我遍历我所有的哲学家并检查它的 id（id 从 0 开始）是否可以被 2 整除。
First round just threads id's that are divisible by 2 eat and second round only threads id's that are not will eat and so on in an infinite loop.第一轮只有线程 id 可以被 2 整除，第二轮只有线程 id 不能被 eat 整除，在无限循环中依此类推。
I know this a stupidly simple solution which probably doesn't solve the problem in the first place.我知道这是一个非常简单的解决方案，可能一开始就无法解决问题。 so please bear with me.所以请多多包涵。 if you have any questions please let me know in the comment.如果您有任何问题，请在评论中告诉我。

struct typedef t_philo
{
    pthread_t thread;
    pthread_mutex_t fork;
    int id;
}
t_philo;

void *eat(void *arg)
{
    t_philo *philo = (t_philo *)arg;

    pthread_mutex_lock(&philo->fork);
    printf("philo with id: %i is eating\n", philo->id);
    pthread_mutex_unlock(&philo->fork);
    return (NULL);
}

void first_round(t_philo *philo, int len)
{
    for (int i = 0; i < len; i++)
        if (!(i % 2))
            pthread_join(philo[i].thread, NULL);
}

void second_round(t_philo *philo, int len)
{
    for (int i = 0; i < len; i++)
        if ((i % 2))
            pthread_join(philo[i].thread, NULL);
}


int main(int argc, char **argv)
{
    t_philo *philo;
    // how many philosophers is given as first arg.
    int len = atoi(argv[1]);
    if (argc < 2)
        exit(EXIT_FAILURE);
    philo = malloc(sizeof(*philo) * atoi(argv[1]));
    //this function add id's and initialize some data.
    init_data(philo, argv);
    for (int i = 0; i < len; i++)
        pthread_create(&philo[i].thread, NULL, eat(&philo[i]), &philo[i]);

        while (1)
        {
            first_round(philo, len);
            second_round(philo, len);
        }
    return 0;
}

OUTPUT OUTPUT

philo with id: 0 is eating
philo with id: 1 is eating
philo with id: 2 is eating
philo with id: 3 is eating
philo with id: 4 is eating
            .
            .
            .
            .
philo with id random is eating
[1]    29903 segmentation fault

Output reaches a random id each time and segfault, that's why i concluded it might be a thread bug. Output 每次都达到一个随机 ID 和段错误，这就是为什么我断定它可能是一个线程错误。

Answer 1

pthread_create has this following prototype: pthread_create具有以下原型：

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                   void *(*start_routine) (void *), void *arg);

start_routine is a function pointer . start_routine是一个function 指针。 However, you call pthread_create with wrong arguments : eat(&philo[i]) .但是，您使用错误的 arguments调用pthread_create ： eat(&philo[i]) 。 Thus, the program calls eat correctly and then tries to call NULL (from a new thread) because this is the value returned by eat .因此，程序正确调用了eat ，然后尝试调用NULL （从新线程），因为这是eat返回的值。 The randomness comes from the variable time to actually create the threads.随机性来自实际创建线程的可变时间。

Note that using a debugger should help you to find and fix the problem very easily.请注意，使用调试器应该可以帮助您轻松找到并修复问题。 Debuggers like gdb are a bit hard to learn but once this is done, errors like segfault becomes almost easy to fix.像 gdb 这样的调试器有点难学，但一旦学会了，像段错误这样的错误就变得很容易修复了。 I am also surprised a compiler like clang would not notice the typing issue at compile time.我也很惊讶像 clang 这样的编译器不会在编译时注意到输入问题。

Answer 2

In main , int len = argv[1];在main中， int len = argv[1]; is wrong.是错的。 If you compiled with warnings enabled (eg -Wall ), this statement would be flagged by the compiler.如果您在启用警告的情况下进行编译（例如-Wall ），则此语句将被编译器标记。

As is, you're getting a huge value for len .照原样，您将获得len的巨大价值。 So, later, the for loop will overflow the array you allocate and you have UB.因此，稍后， for循环将溢出您分配的数组并且您有 UB。

You probably want: int len = atoi(argv[1]);你可能想要： int len = atoi(argv[1]); as you have for the malloc .正如您对malloc的那样。

And, you want to do this after the argc check.而且，您想在argc检查之后执行此操作。

And, why call atoi twice [with the fix]?而且，为什么要 [使用修复程序] 调用atoi两次？

Here's the refactored code:这是重构的代码：

int
main(int argc, char **argv)
{
    t_philo *philo;

    if (argc < 2)
        exit(EXIT_FAILURE);

    // how many philosophers is given as first arg.
    int len = atoi(argv[1]);

    philo = malloc(sizeof(*philo) * len);

    // do stuff ...

    return 0;
}

为什么这个多线程程序会出现段错误？

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-03-17 19:48:11

解决方案2
0 2022-03-17 19:41:42

为什么这个多线程程序会出现段错误？

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-03-17 19:48:11

解决方案2 0 2022-03-17 19:41:42

解决方案1
2 已采纳 2022-03-17 19:48:11

解决方案2
0 2022-03-17 19:41:42