boost find in shared-memory 方法卡在 c++ 多进程项目中

Question

我正在使用 boost 的 ipc 库来保存复杂的 object，包括共享 memory 中的图像，由多个进程使用。 我们称之为 object MyImage 。 共享 memory 是一个循环缓冲区，一次保存多个MyImage对象。

在我的代码中，有两个（或更多）进程写入共享 memory 中的一个段，另一个进程从中读取。 此流程按预期工作，但是在读取器进程完成或崩溃后，当它尝试在共享 memory 中打开相同的 object 时，它再次卡在find方法上，而写入器进程仍然运行良好。

我试图了解哪种竞争条件可能导致这种情况，但在我的代码或 boost 的文档中找不到任何解释。

这是一个简单的代码示例我的项目中的问题：

Writer程序：

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/circular_buffer.hpp>

using namespace std;
namespace bip = boost::interprocess;

static const char *const PLACE_SHM_NAME = "PlaceInShm";
static const char *const OBJECT_SHM_NAME = "ObjectInShm";
static const char *const PUSH_POP_LOCK = "push_pop_image_lock";
static const int IMAGES_IN_BUFFER = 20;
static const int OBJECT_SIZE_IN_SHM = 91243520;

class MyImage;

typedef bip::managed_shared_memory::segment_manager SegmentManagerType;
typedef bip::allocator<void, SegmentManagerType> MyImageVoidAllocator;
typedef bip::deleter<MyImage, SegmentManagerType> MyImageDeleter;
typedef bip::shared_ptr<MyImage, MyImageVoidAllocator, MyImageDeleter> MyImageSharedPtr;

typedef bip::allocator<MyImageSharedPtr, bip::managed_shared_memory::segment_manager> MyImageShmemAllocator;
typedef boost::circular_buffer<MyImageSharedPtr, MyImageShmemAllocator> MyImageContainer;

MyImageSharedPtr GetMyImage() {
    // some implementation
    return nullptr;
}

int main(int argc, char *argv[]) {

    MyImageContainer *my_image_data_container;
    try {
        bip::named_mutex open_lock{bip::open_or_create, OPEN_SHM_LOCK};
        bip::managed_shared_memory image_segment = bip::managed_shared_memory(bip::open_or_create, PLACE_SHM_NAME, OBJECT_SIZE_IN_SHM);
        my_image_data_container = image_segment.find_or_construct<MyImageContainer>(OBJECT_SHM_NAME)(IMAGES_IN_BUFFER, image_segment.get_segment_manager());
    } catch (boost::interprocess::interprocess_exception &e) {
        exit(1);
    }
    boost::interprocess::named_mutex my_image_mutex_ptr(boost::interprocess::open_or_create, PUSH_POP_LOCK);

    while (true) {
        MyImageSharedPtr img = GetMyImage();
        my_image_mutex_ptr.lock();
        my_image_data_container->push_back(img);
        my_image_mutex_ptr.unlock();
        usleep(1000);
    }
}

Reader流程：

int main(int argc, char *argv[]) {

    MyImageContainer *my_image_data_container;
    try {
        bip::named_mutex open_lock{bip::open_only, OPEN_SHM_LOCK};
        bip::scoped_lock<bip::named_mutex> lock(open_lock, bip::try_to_lock);
        bip::managed_shared_memory image_segment = bip::managed_shared_memory(bip::open_only, PLACE_SHM_NAME);
        my_image_data_container = image_segment.find<MyImageContainer>(OBJECT_SHM_NAME).first;
    } catch (boost::interprocess::interprocess_exception &e) {
        exit(1);
    }
    boost::interprocess::named_mutex my_image_mutex_ptr(boost::interprocess::open_or_create, PUSH_POP_LOCK);

    while (true) {
        if (my_image_data_container->size() == 0) {
            continue;
        }
        MyImage *img;
        my_image_mutex_ptr.lock();
        img = &(*my_image_data_container->at(0));
        my_image_data_container->pop_front();
        my_image_mutex_ptr.unlock();
        // do stuff with img
        usleep(1000);
    }
}

重现错误的流程：

运行Writer代码的两个进程。
运行Reader代码的一个进程。
杀死Reader进程。
再次运行Reader进程。

在第二次运行时，进程卡在image_segment.find<MyImageContainer>(OBJECT_SHM_NAME).first;行。 而Writer进程很好。

重要的是要提到每个Writer进程都有一个唯一的 id，并且写入共享 memory 中的缓冲区仅int(IMAGES_IN_BUFFER / NUMBER_OF_WRITERS)从索引开始的图像作为他的 id。 例如，我有两个Writer id 0 和 id 1， IMAGES_IN_BUFFER=20 ，那么Writer 0将写入索引 0-9 和Writer 1到 10-19。

我的一些调试过程：

我尝试使用future的 object 在单独的线程中打开共享的 memory，并将超时设置为几秒。 但是整个过程还是卡住了。
当我在卡住后杀死进程并重新运行它时，它再也不会成功，除非我从共享 memory 中删除 object 并重新运行所有进程，包括Writer s。
通常在使用一个Writer运行时，我无法重现该错误，但我不能肯定地说。
它并不一致，这意味着我不知道什么时候会卡住，什么时候不会卡住。
也许共享 memory 中的 object 以某种方式损坏，而Reader进程正在崩溃，然后在重新打开它时失败。 在这种情况下，我希望 boost 会引发异常而不是挂起。
当进程正常退出时，退出代码为 0，它也可能发生。

等待听到一些关于可能导致流程卡住的原因的意见。 提前致谢！

Answer 1

我看到你的代码有很多问题。 除此之外，还有一个已知的限制。 所以让我们从那个开始

进程间互斥体的鲁棒性

首先，库中一个众所周知的问题是没有健壮的进程间互斥锁（可移植）：

如何获得废弃的 boost::interprocess::interprocess_mutex 的所有权？
提升进程间互斥量和检查放弃等。
但也请参阅https://www.boost.org/doc/libs/1_76_0/boost/interprocess/detail/robust_emulation.hpp我认为在某些平台/文件系统上有使用文件锁的替代方法。

因此，您可能做的最好的事情确实是进行定时等待并有一个“强制清除”选项，当您知道这样做是安全的时，您可以手动参与。

代码问题和审查

也就是说，代码存在一些问题，并且您可以改进一些事情以使事情变得不那么脆弱。

你提到了崩溃。 这是不言而喻的：崩溃会破坏不变量，避免它们。
这可能包括有一个适当的中断信号处理程序以确保您正确关闭。
您永远不会锁定写入器路径中的打开锁。 这是一个应该解决的明显问题。
在阅读器路径中，您发出try_lock但似乎从未验证它是否成功。
在所示代码中，您在共享 memory 段被破坏后使用my_image_data_container 。 根据定义，这将始终是未定义的行为
您没有为 push/pop mutex ( my_image_mutex_ptr ) 使用启用 RAII 的锁防护。 这意味着它不是异常安全的，并且会再次导致锁卡在异常上。
通常，您似乎将可锁定原语与锁混淆了。 我建议重命名对象（ open_lock -> open_mutex ， my_image_mutex_ptr (?!) -> modify_mutex ）以避免这种混淆。
我可能会建议对打开和修改使用相同的互斥锁（毕竟，在修改期间实际上不允许创建段，是吗？）。 或者，考虑在共享段内使用未命名的进程间互斥体来消除不必要的 SHM 命名空间污染。 即使在删除了共享的 memory 段本身之后，也可能会被卡住的锁更少。）。
->empty()检查是数据竞争：
- 它不会锁定修改互斥体，因此可能会同时写入容器
- 它没有持有相同的锁，因此在从empty()返回后，返回值不再可靠，因为同时可能已修改某些内容
在 C++ 中，数据竞争也在调用未定义行为
这里有很大的问题：
```
 img = &(*container->at(0));
```
这将取消引用共享指针，仅保留原始指针。 但是，容器中的下一行pop_front() ，因此共享指针被删除，可能（很可能，给定显示的代码）破坏图像。
只是不要丢失引用计数，并使用共享指针。
许多名称可以提高可读性。 您通常会认为“计算机不在乎”，但人类会。 以及所有的微观混乱复合，这可能解释了这篇文章中发现的 50% 的错误。
一些松散的结尾（异常最好由const&捕获；您应该检查find<>()上的 bool ， container->at(0)可以拼写为container->front()等）

柜台演示

这是针对上述内容审查的代码版本，并使用更现代的 C++ 风格编写。 writer 和 reader 现在在一个 main 中（您可以使用任意命令行参数进行切换）。

住在科利鲁

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <boost/interprocess/smart_ptr/shared_ptr.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/circular_buffer.hpp>
#include <iostream>
#include <mutex>
#include <thread>

namespace bip = boost::interprocess;
using namespace std::chrono_literals;
constexpr char const* OPEN_SHM_LOCK = "OPEN_SHM_LOCK";

static const char* const SHM_NAME         = "PlaceInShm";
static const char* const OBJ_NAME         = "ObjectInShm";
static const char* const PUSH_POP_LOCK    = "push_pop_image_lock";
static const int         BUF_CAPACITY = 20;
static const int         SHM_SIZE         = 91243520;

using Segment = bip::managed_shared_memory;
using Mgr     = Segment::segment_manager;

class MyImage{};

template <typename T> using Alloc = bip::allocator<T, Mgr>;
using MyDeleter                   = bip::deleter<MyImage, Mgr>;
using SharedImage = bip::shared_ptr<MyImage, Alloc<MyImage>, MyDeleter>;

using Container = boost::circular_buffer<SharedImage, Alloc<SharedImage>>;

SharedImage GetMyImage() {
    // some implementation
    return {};
}

int main(int argc, char**) try {
    bool const isWriter = (argc == 1);
    std::cout << (isWriter? "Writer":"Reader") << std::endl;

    // extract variable part to reduce code duplication
    auto find_container = [isWriter](Segment& smt) {
        if (isWriter)
            return smt.find_or_construct<Container>(OBJ_NAME)(
                BUF_CAPACITY, smt.get_segment_manager());

        auto [container, ok] = smt.find<Container>(OBJ_NAME);
        assert(ok); // TODO proper error handling?

        return container;
    };

    bip::named_mutex open_mutex{bip::open_or_create, OPEN_SHM_LOCK};
    if (std::unique_lock open_lk{open_mutex, std::try_to_lock}) {
        Segment smt(bip::open_or_create, SHM_NAME, SHM_SIZE);
        auto container = find_container(smt);

        open_lk.unlock();

        bip::named_mutex modify_mutex(bip::open_or_create, PUSH_POP_LOCK);

        while (isWriter) {
            SharedImage img = GetMyImage();

            {
                std::unique_lock lk(modify_mutex);
                container->push_back(img);
            }
            std::cout << "Pushed" << std::endl;
            std::this_thread::sleep_for(1s);
        }

        while (not isWriter) {
            SharedImage img;

            if (std::unique_lock lk(modify_mutex); !container->empty()) {
                img = std::move(container->front());
                container->pop_front();
            } else {
                continue;
            }

            // if (img)
            {
                // do stuff with img
                std::cout << "Popped" << std::endl;
            }

            std::this_thread::sleep_for(1s);
        }
    } else {
        std::cout << "Failed to acquire open lock" << std::endl;
    }
} catch (bip::interprocess_exception const& e) {
    std::cerr << "Error: " << e.what() << std::endl;
    exit(1);
}

在我的系统上运行良好。 我留给读者作为练习：替换修改锁并添加信号处理程序以进行关闭。

录制的演示

这是一个简单的录制演示，演示它在我的系统上按预期工作，使用不同数量的读取器/写入器：

boost find in shared-memory 方法卡在 c++ 多进程项目中

问题描述

1 个解决方案

解决方案1
4 2021-06-14 12:14:17

进程间互斥体的鲁棒性

代码问题和审查

柜台演示

录制的演示

boost find in shared-memory 方法卡在 c++ 多进程项目中

问题描述

1 个解决方案

解决方案1 4 2021-06-14 12:14:17

进程间互斥体的鲁棒性

代码问题和审查

柜台演示

录制的演示

解决方案1
4 2021-06-14 12:14:17