简体   繁体   English

尝试通过两个不同进程锁定共享内存时,为什么会出现pthread_mutex_t segfault?

[英]Why does pthread_mutex_t segfault when trying to lock through shared memory from two different processes?

I wrote a super simple wrapper for a pthread_mutex_t meant to be used between two processes: 我为pthread_mutex_t写了一个超级简单的包装器,该包装器打算在两个进程之间使用:

//basic version just to test using it between two processes
struct MyLock
{
    public:
        MyLock() {
            pthread_mutexattr_init(&attr);
            pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
            pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ADAPTIVE_NP);

            pthread_mutex_init(&lock, &attr);
        }

        ~MyLock() {
            pthread_mutex_destroy(&lock);
            pthread_mutexattr_destroy(&attr);
        }

        lock() {
            pthread_mutex_lock(&lock);
        }

        unlock() {
            pthread_mutex_unlock(&lock);
        }

    private:
        pthread_mutexattr_t attr;
        pthread_mutex_t lock;
};

I am able to see this lock work fine between regular threads in a process but when I run process A which does the following in a shared memory region: 我能够看到此锁在进程中的常规线程之间正常工作,但是当我运行进程A时,该进程在共享内存区域中执行以下操作:

void* mem; //some shared memory from shm_open
MyLock* myLock = new(mem) MyLock;
//loop sleeping random amounts and calling ->lock and ->unlock

Then process B opens the shared memory object (verified by setting it with combinations of characters that it's the same region of memory) and does this: 然后,进程B打开共享内存对象(通过将其设置为与内存相同的区域的字符组合进行验证)并执行以下操作:

MyLock* myLock = reinterpret_cast<MyLock*>(mem);
//same loop for locking and unlocking as process A

but process B segfaults when trying to lock with the backtrace leading to pthread_mutex_lock() in libpthread.so.0 但是尝试使用回溯锁进行锁定时进程B segfaults导致libpthread.so.0中的pthread_mutex_lock()

What am I doing wrong? 我究竟做错了什么?

The backtrace I get from process B looks like this: 我从进程B获得的回溯看起来像这样:

in pthread_mutex_lock () from /lib64/libpthread.so.0
in MyLock::lock at MyLock.H:50
in Server::setUpSharedMemory at Server.C:59
in Server::Server at Server.C
in main.C:52

The call was the very first call to lock after reinterpret casting the memory into a MyLock* . 在重新解释将内存转换为MyLock*之后,该调用是锁定的第一个调用。 If I dump the contents of MyLock in gdb in the crashing process I see: 如果在崩溃过程中将MyLock的内容转储到gdb中,我会看到:

{
attr = {
    __size = "\003\000\000\200",
    __align = -2147483645
},
lock = {
    __data = {
      __lock = 1
      __count = 0,
      __owner = 6742, //this is the lightweight process id of a thread in process A
      __nusers = 1,
      __kind = 131,
      __spins = 0,
      __list = {
        __prev = 0x0,
        __Next = 0x0
       }
      },
      __size = "\001\000\000\000\000 //etc,
      __align = 1     
  }
}

so it looks alright (looks like this in the other process gdb as well). 因此看起来还不错(在其他进程gdb中也是如此)。 I am compiling both applications together using no additional optimization flags either. 我正在将两个应用程序一起编译,也没有使用其他优化标志。

You didn't post the code to open and initialize a shared memory region but I suspect that part might be responsible for your problem. 您没有发布用于打开和初始化共享内存区域的代码,但我怀疑这可能是造成您问题的原因。

Because pthread_mutex_t is much larger than "combination of characters," you should check your shm_open(3) - ftruncate(2) - mmap(2) sequence with reading and writing a longer (~ KB) string. 因为pthread_mutex_t比“字符组合”大得多,所以您应该通过读写更长的(〜KB)字符串来检查shm_open(3) ftruncate(2) mmap(2)序列。

Dont't forget to check both endpoints can really write to the shm region and the written data is really visible to the other side. 别忘了检查两个端点是否可以真正写入shm区域,并且写入的数据对另一端确实可见。

Process A: [open and initialize the shm]-[write AAA...AA]-[sleep 5 sec]-[read BBB...BB]-[close the thm] 流程A:[打开并初始化shm]-[写入AAA ... AA]-[睡眠5秒]-[读取BBB ... BB]-[关闭thm]

Process B: (a second or two later) [open the shm]-[read AAA...AA]-[write BBB...BB]-[close the thm] 流程B :(一两秒钟后)[打开shm]-[读取AAA ... AA]-[写入BBB ... BB]-[关闭thm]

I have a similar issue where the writer Process is root and the Readers Processes are regular users (case of a hardware daemon). 我有一个类似的问题,其中writer进程是root用户,而Readers进程是普通用户(硬件守护程序的情况)。 This would segfault in Readers as soon as any pthread_mutex_lock() or pthread_cond_wait() and their unlock counterparts were called. 一旦调用任何pthread_mutex_lock()pthread_cond_wait()及其解锁对象,就会在Readers中出现段错误。

I solved it by modifying the SHM file permissions using an appropriated umask : 我通过使用适当的umask修改SHM文件权限来解决此问题:

Writer 作家

umask(!S_IRUSR|!S_IWUSR|!S_IRGRP|!S_IWGRP|!S_IROTH|!S_IWOTH);
FD=shm_open("the_SHM_file", O_CREAT|O_TRUNC|O_RDWR, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH);
ftruncate(FD, 28672);
SHM=mmap(0, 28672, PROT_READ|PROT_WRITE, MAP_SHARED, FD, 0);

Readers 读者群

FD=shm_open("the_SHM_file", O_RDWR, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH);
SHM=mmap(0, 28672, PROT_READ|PROT_WRITE, MAP_SHARED, A.FD, 0);

You don't say what OS you are using, but you don't check the return value of the pthread_mutexattr_setpshared call. 您没有说您正在使用什么操作系统,但是您没有检查pthread_mutexattr_setpshared调用的返回值。 It's possible your OS does not support shared mutexes and this call is failing. 您的操作系统可能不支持共享互斥锁,并且此调用失败。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM