简体   繁体   中英

Is this a bug in glibc/pthread?

I am using a robust mutex together with a condition. This works most of the time, but infrequently, I get deadlocks. I could not reduce this to a small, reproducible example, and I consider it very likely that it is a problem in my code, however, I noticed something that looks suspicious:

When the code deadlocks, one thread is in pthread_cond_broadcast:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f4ab2892970 in pthread_cond_broadcast@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S:133

Another thread is in pthread_mutex_lock, on the mutex which is used with the condition:

#0  __lll_robust_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevelrobustlock.S:85
#1  0x00007f4ab288e7d7 in __pthread_mutex_lock_full (mutex=0x7f4a9858b128) at ../nptl/pthread_mutex_lock.c:256

As you can see, pthread_mutex_lock uses lowlevelrobustlock, while pthread_cond_broadcast uses lowlevellock. Is it possible that the condition somehow uses a non-robust mutex internally?

I use the mutex to protected shared memory, and it is possible that one of the processes sharing it gets killed. So, maybe my deadlocks happen because the process was inside pthread_cond_broadcast when it was killed, and now, the other process can not broadcast, because the killed process still owns the mutex? After all, a similar situation was why I started using a robust mutex in the first place.

PS: Situations where the process gets killed in the critical section are handled, the robust mutex works great. For all the deadlocks, I saw this situation where pthread_cond_broadcast was the active function.

PPS: for the mutex, there is pthread_mutexattr_setrobust, but I could not find something like pthread_condattr_setrobust. Does it exist?

EDIT:

This 'bug' has been reported here . It is just undefined behavior of condition variable in this particular use case. There are no robust condition variables, so they cannot be used in IPC with shared memory. Thread cancellation can leave the condition variable in inconsistent state.

The previous answer is below:

I have the same problem. Here is example code that causes deadlock in pthread_cond_broadcast:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

#define TRUE 1
#define FALSE 0

typedef struct {
    pthread_cond_t cond;
    pthread_mutex_t mtx;
    int predicate;
} channel_hdr_t;

typedef struct {
    int fd;
    channel_hdr_t *hdr;
} channel_t;

void printUsage() {
    printf("usage: shm_comm_test2 channel_name1 channel_name2\n");
}

int robust_mutex_lock(pthread_mutex_t *mutex) {
  // lock hdr mutex in the safe way
  int lock_status = pthread_mutex_lock (mutex);
  int acquired = FALSE;
  int err = -18;
  switch (lock_status)
  {
  case 0:
    acquired = TRUE;
    break;
  case EINVAL:
    printf("**** EINVAL ****\n");
    err = -12;
    break;
  case EAGAIN:
    printf("**** EAGAIN ****\n");
    err = -13;
    break;
  case EDEADLK:
    printf("**** EDEADLK ****\n");
    err = -14;
    break;
  case EOWNERDEAD:
    // the reader that acquired the mutex is dead
    printf("**** EOWNERDEAD ****\n");

    // recover the mutex
    if (pthread_mutex_consistent(mutex) == EINVAL) {
    printf("**** EOWNERDEAD, EINVAL ****\n");
      err = -15;
      break;
    }
    acquired = TRUE;
    break;
  default:
    printf("**** OTHER ****\n");
    // other error
    err = -18;
    break;
  }

  return acquired ? 0 : err;
}

int init_channel(char *shm_name, channel_t *out) {
    int initialize = FALSE;

    int shm_fd = shm_open (shm_name, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP);
    if (shm_fd < 0) {
        if (errno == EEXIST) {
            // open again, do not initialize
            shm_fd = shm_open (shm_name, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP);
            if (shm_fd < 0) {
                printf( "ERROR: could not create %s, errno: %d\n", shm_name, errno );
                return 1;
            }
        }
        else {
            printf( "ERROR: could not create %s, errno: %d\n", shm_name, errno );
            return 2;
        }
    }
    else {
        // the shm object was created, so initialize it
        initialize = TRUE;

        printf("created shm object %s\n", shm_name);
        if (ftruncate (shm_fd, sizeof(channel_hdr_t)) != 0)
        {
            printf( "ERROR: could not ftruncate %s, errno: %d\n", shm_name, errno );
            close (shm_fd);
            shm_unlink (shm_name);
            return 3;
        }
    }

    void *ptr_shm_hdr = mmap (NULL, sizeof(channel_hdr_t), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0);

    if (ptr_shm_hdr == MAP_FAILED)
    {
        printf( "ERROR: could not mmap %s, errno: %d\n", shm_name, errno );
        close (shm_fd);
        shm_unlink (shm_name);
        return 4;
    }

    channel_hdr_t *shm_hdr = ptr_shm_hdr;

    if (initialize) {
        // set mutex shared between processes
        pthread_mutexattr_t mutex_attr;
        pthread_mutexattr_init(&mutex_attr);
        pthread_mutexattr_setpshared (&mutex_attr, PTHREAD_PROCESS_SHARED);
        pthread_mutexattr_setrobust (&mutex_attr, PTHREAD_MUTEX_ROBUST);
        pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);

        pthread_mutex_init (&shm_hdr->mtx, &mutex_attr);

        // set condition shared between processes
        pthread_condattr_t cond_attr;
        pthread_condattr_init(&cond_attr);
        pthread_condattr_setpshared (&cond_attr, PTHREAD_PROCESS_SHARED);
        pthread_cond_init (&shm_hdr->cond, &cond_attr);
    }

    shm_hdr->predicate = 0;
    out->fd = shm_fd;
    out->hdr = shm_hdr;

    return 0;
}

int main(int argc, char **argv) {
    if (argc != 3) {
        printUsage();
        return 0;
    }

    char *shm_1_name = argv[1];
    char *shm_2_name = argv[2];

    channel_t ch_1;
    if (init_channel(shm_1_name, &ch_1) != 0) {
        return 1;
    }

    channel_t ch_2;
    if (init_channel(shm_2_name, &ch_2) != 0) {
        munmap( ch_1.hdr, sizeof(channel_hdr_t) );
        close( ch_1.fd );
        return 2;
    }

    int counter = 0;
    int counter2 = 0;
    while (TRUE) {
        ++counter;
        if (counter == 100000) {
            printf("alive %d\n", counter2);
            ++counter2;
            counter = 0;
        }
        int ret = robust_mutex_lock(&ch_1.hdr->mtx);
        if (ret != 0) {
            return ret;
        }
        ch_1.hdr->predicate = 1;
        pthread_cond_broadcast (&ch_1.hdr->cond);     // deadlock here
        pthread_mutex_unlock (&ch_1.hdr->mtx);



        ret = robust_mutex_lock(&ch_2.hdr->mtx);
        if (ret != 0) {
            return ret;
        }

        while (ch_2.hdr->predicate == 0 && ret == 0)
        {
            ret = pthread_cond_wait (&ch_2.hdr->cond, &ch_2.hdr->mtx);  // deadlock here
        }
        ch_2.hdr->predicate = 0;
        pthread_mutex_unlock (&ch_2.hdr->mtx);
    }

    munmap( ch_1.hdr, sizeof(channel_hdr_t) );
    close( ch_1.fd );

    munmap( ch_2.hdr, sizeof(channel_hdr_t) );
    close( ch_2.fd );

    return 0;
}

To reproduce the deadlock:

  1. run the first instance of program with args: channel1 channel2
  2. run the second instance of program with args: channel2 channel1
  3. interrupt both programs with Ctrl+C
  4. run both programs again

The problem was not present in Ubuntu 16.04. However, it happens in 18.04.

The backtraces of both programs in deadlock:

First:

#0  0x00007f9802d989f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f98031cd02c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f98031cd030, cond=0x7f98031cd000) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7f98031cd000, mutex=0x7f98031cd030) at pthread_cond_wait.c:655
#3  0x00005648bc2af081 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/dseredyn/ws_velma/ws_fabric/src/shm_comm/src/test2.c:198

Second:

#0  0x00007f1a3434b724 in futex_wait (private=<optimized out>, expected=3, futex_word=0x7f1a34780010)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=<optimized out>, expected=3, futex_word=0x7f1a34780010)
    at ../sysdeps/nptl/futex-internal.h:135
#2  __condvar_quiesce_and_switch_g1 (private=<optimized out>, g1index=<synthetic pointer>, wseq=<optimized out>, 
    cond=0x7f1a34780000) at pthread_cond_common.c:412
#3  __pthread_cond_broadcast (cond=0x7f1a34780000) at pthread_cond_broadcast.c:73
#4  0x0000557a978b2043 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/dseredyn/ws_velma/ws_fabric/src/shm_comm/src/test2.c:185

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM