跨进程内存屏障

Question

我正在使用内存映射文件进行跨进程数据共享。

我有两个过程，一个过程写数据块，另一个过程读数据块。 为了让读者知道一个块是否已准备好，我在写两个“标记”值，每个块的开头和结尾分别写一个，以表明它已准备好。

看起来像这样：

注意：在此示例中，我不包括阅读器进程可以搜索以前的块的事实。

static const int32_t START_TAG = 0xFAFAFAFA;
static const int32_t END_TAG = 0x06060606;

void writer_process(int32_t* memory_mapped_file_ptr)
{
    auto ptr = memory_mapped_file_ptr;
    while (true)
    {
        std::vector<int32_t> chunk = generate_chunk();
        std::copy(ptr + 2, chunk.begin(), chunk.end());

        // We are done writing. Write the tags.

        *ptr = START_TAG;
        ptr += 1;
        *ptr = chunk.size();
        ptr += 1 + chunk.size();
        *ptr = END_TAG;
        ptr += 1;
    }   
}

void reader_process(int32_t* memory_mapped_file_ptr)
{
    auto ptr = memory_mapped_file_ptr;
    while (true)
    {
        auto ptr2 = ptr;

        std::this_thread::sleep_for(std::chrono::milliseconds(20));

        if (*ptr2 != START_TAG)
            continue;

        ptr2 += 1;

        auto len = *ptr2;
        ptr2 += 1;

        if (*(ptr2 + len) != END_TAG)
            continue;

        std::vector<int32_t> chunk(ptr2, ptr2 + len);

        process_chunk(chunk);
    }
}

到目前为止，这种工作。 但是在我看来，这是一个非常糟糕的主意，由于缓存行为，它可能导致各种奇怪的错误。

有没有更好的方法来实现这一目标？

我看了看：

消息队列：效率低下，仅适用于单个阅读器。 我也不能寻求以前的块。
互斥锁：不知道如何仅锁定当前块而不是整个内存。 我不能为每个可能的块使用互斥体（尤其是因为它们具有动态大小）。 我曾考虑过将内存分为每个互斥体的块，但是由于写入和读取之间的延迟，这对我不起作用。

Answer 1

正如其他人所提到的，您需要某种内存屏障来确保事物在多个处理器（和进程）之间正确同步。

我建议您使用定义一组当前可用条目的标头更改方案，并在新条目可用时使用互锁增量。

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683614%28v=vs.85%29.aspx

我建议的结构是这样的，因此您实际上可以实现所需的内容，并快速完成：

// at the very start, the number of buffers you might have total
uint32_t   m_size;    // if you know the max. number maybe use a const instead...

// then m_size structures, one per buffer:
uint32_t   m_offset0;  // offset to your data
uint32_t   m_size0;    // size of that buffer
uint32_t   m_busy0;    // whether someone is working on the buffer
uint32_t   m_offset1;
uint32_t   m_size1;
uint32_t   m_busy1;
...
uint32_t   m_offsetN;
uint32_t   m_sizeN;
uint32_t   m_busyN;

使用偏移量和大小，您可以直接访问映射区域中的任何缓冲区。 要分配缓冲区，您可能想要实现类似于malloc（）的操作，尽管在此表中可以找到所有必需的信息，因此不需要链表等。但是，如果要释放一些缓冲区，您需要跟踪其大小。 而且，如果您一直都在分配/释放资源，那么碎片化将很有趣。 无论如何...

另一种方法是利用环形缓冲区（本质上是一个“管道”），因此您总是在最后一个缓冲区之后分配，如果那里没有足够的空间，则从头开始分配，根据新缓冲区的大小关闭N个缓冲区要求...这可能更容易实现。但是，这意味着您在寻找缓冲区时可能需要知道从哪里开始（即，为当前被认为是“第一个” [最旧]缓冲区的索引，该索引将恰好是下一个要重用的缓冲区）。

但是，由于您没有说明缓冲区如何变成“旧的”和可重复使用的（释放后可以重新使用），因此我无法真正为您提供确切的实现。 但是类似以下的内容可能会为您做得到。

在头结构中，如果m_offset为零，则当前未分配缓冲区，因此与该条目无关。 如果m_busy为零，则没有进程正在访问该缓冲区。 我还介绍了一个m_free字段，该字段可以为0或1。只要需要更多缓冲区来保存刚接收到的数据，编写器就会将该参数设置为1。 我不会太深，因为我也不完全知道如何释放缓冲区。 如果您也从不释放缓冲区，则不需要它。

0）结构

// only if the size varies between runs, otherwise use a constant like:
// namespace { uint32_t const COUNT = 123; }
struct header_count_t
{
    uint32_t    m_size;
};

struct header_t
{
    uint32_t    m_offset;
    uint32_t    m_size;
    uint32_t    m_busy;  // to use with Interlocked...() you may want to use LONG instead
};

// and from your "ptr" you'd do:
header_count_t *header_count = (header_count_t *) ptr;
header_count->m_size = ...; // your dynamic size (if dynamic it needs to be)
header_t *header = (header_t *) (header_count + 1);
// first buffer will be at: data = (char *) (header + header_count->m_size)
for(size_t n(0); n < header_count->m_size; ++n)
{
   // do work (see below) on header[n]
   ...
}

1）写者访问数据必须首先锁定缓冲区，如果不可用，请尝试下一个缓冲区； 锁定通过InterlockedIncrement()完成，并通过InterlockedDecrement()解锁：

InterlockedIncrement(&header[n]->m_busy);
if(header[n]->m_offset == nullptr)
{
     // buffer not allocated yet, allocate now and copy data,
     // but do not save the offset until "much" later
     uint32_t offset = malloc_buffer();
     memcpy(ptr + offset, source_data, size);
     header[n]->m_size = size;

     // extra memory barrier to make sure that the data copied
     // in the buffer is all there before we save the offset
     InterlockedIncrement(&header[n]->m_busy);
     header[n]->m_offset = offset;
     InterlockedDecrement(&header[n]->m_busy);
}
InterlockedDecrement(&header[n]->m_busy);

现在，如果您希望能够释放缓冲区，这还远远不够。 在这种情况下，必须使用另一个标志来防止其他进程重用旧缓冲区。 同样，这取决于您的实现...（请参见下面的示例。）

2）读取器必须先使用InterlockedIncrement()锁定缓冲区，然后才能使用InterlockedDecrement()释放缓冲区。 请注意，即使m_offset为nullptr，该锁也适用。

InterlockedIncrement(&header[n]->m_busy);
if(header[n]->m_offset)
{
    // do something with the buffer
    uint32_t size(header[n]->m_size);
    char const *buffer_ptr = ptr + header[n]->m_offset;
    ...
}
InterlockedDecrement(header[n]->m_busy);

所以在这里我只是测试是否设置了m_offset。

3）如果要释放缓冲区，还需要测试另一个标志（请参见下文），如果另一个标志为true（或false），则即将释放缓冲区（所有进程释放后）它），然后可以在先前的代码段中使用该标志（即m_offset为零，或者该标志为1且m_busy计数器正好为1）。

对于作家来说，是这样的：

LONG lock = InterlockedIncrement(&header[n]->m_busy);
if(header[n]->m_offset == nullptr
|| (lock == 1 && header[n]->m_free == 1))
{
    // new buffer (nullptr) or reusing an old buffer

    // reset the offset first
    InterlockedIncrement(&header[n]->m_busy);
    header[n]->m_offset = nullptr;
    InterlockedDecrement(&header[n]->m_busy);
    // then clear m_free
    header[n]->m_free = 0;
    InterlockedIncrement(&header[n]->m_busy);  // WARNING: you need another Decrement against this one...

    // code as before (malloc_buffer, memcpy, save size & offset...)
    ...
}
InterlockedDecrement(&header[n]->m_busy);

在阅读器中，测试的变化如下：

if(header[n]->m_offset && header[n]->m_free == 0)

附带说明一下：所有Interlocked ...（）函数都是完整的内存屏障（栅栏），因此您在这方面都很出色。 您必须使用它们中的许多来确保获得正确的同步。

请注意，这是未经测试的代码...但是，如果要避免进程间信号量（可能不会简化太多），这就是方法。 请注意，除了避免每个读取器占用一个固定的CPU之外，本身不需要20ms的sleep（）。

跨进程内存屏障

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-06-15 05:40:10

跨进程内存屏障

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-06-15 05:40:10

解决方案1
1 已采纳 2014-06-15 05:40:10