简体   繁体   English

跨进程内存屏障

[英]Cross Process Memory Barrier

I'm using memory mapped files for cross process data sharing. 我正在使用内存映射文件进行跨进程数据共享。

I have two processes, one that writes data chunks and one or several others that read these chunks. 我有两个过程,一个过程写数据块,另一个过程读数据块。 In order for the readers to know whether a chunk is ready I'm writing two "tag" values, one at the start and one at the end of each chunk to signal that it is ready. 为了让读者知道一个块是否已准备好,我在写两个“标记”值,每个块的开头和结尾分别写一个,以表明它已准备好。

It looks something like this: 看起来像这样:

NOTE: That in this example I don't include the fact that the reader processes can seek to previous chunks. 注意:在此示例中,我不包括阅读器进程可以搜索以前的块的事实。

static const int32_t START_TAG = 0xFAFAFAFA;
static const int32_t END_TAG = 0x06060606;

void writer_process(int32_t* memory_mapped_file_ptr)
{
    auto ptr = memory_mapped_file_ptr;
    while (true)
    {
        std::vector<int32_t> chunk = generate_chunk();
        std::copy(ptr + 2, chunk.begin(), chunk.end());

        // We are done writing. Write the tags.

        *ptr = START_TAG;
        ptr += 1;
        *ptr = chunk.size();
        ptr += 1 + chunk.size();
        *ptr = END_TAG;
        ptr += 1;
    }   
}

void reader_process(int32_t* memory_mapped_file_ptr)
{
    auto ptr = memory_mapped_file_ptr;
    while (true)
    {
        auto ptr2 = ptr;

        std::this_thread::sleep_for(std::chrono::milliseconds(20));

        if (*ptr2 != START_TAG)
            continue;

        ptr2 += 1;

        auto len = *ptr2;
        ptr2 += 1;

        if (*(ptr2 + len) != END_TAG)
            continue;

        std::vector<int32_t> chunk(ptr2, ptr2 + len);

        process_chunk(chunk);
    }
}

This kind of works so far. 到目前为止,这种工作。 But it looks to me like a very bad idea and could lead to all kinds of weird bugs due to cache behaviour. 但是在我看来,这是一个非常糟糕的主意,由于缓存行为,它可能导致各种奇怪的错误。

Is there a better way to achieve this? 有没有更好的方法来实现这一目标?

I've looked at: 我看了看:

  • message queues: Inefficient and only works with a single reader. 消息队列:效率低下,仅适用于单个阅读器。 Also I cannot seek to previous chunks. 我也不能寻求以前的块。

  • mutexes: Not sure how to only lock for the current chunk instead of the entire memory. 互斥锁:不知道如何仅锁定当前块而不是整个内存。 I can't have a mutex for every possible chunk (especially as they have dynamic size). 我不能为每个可能的块使用互斥体(尤其是因为它们具有动态大小)。 I've considered partitioning the memory into blocks with one mutex each but that won't work for me due to the delay it incurs between writing and reading. 我曾考虑过将内存分为每个互斥体的块,但是由于写入和读取之间的延迟,这对我不起作用。

As mentioned by others, you need to have some kind of memory barrier to make sure that things are properly synchronized between multiple processors (and processes). 正如其他人所提到的,您需要某种内存屏障来确保事物在多个处理器(和进程)之间正确同步。

I would suggest you change your scheme with a header defining a set of currently available entries and using the interlock increment whenever a new entry becomes available. 我建议您使用定义一组当前可用条目的标头更改方案,并在新条目可用时使用互锁增量。

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683614%28v=vs.85%29.aspx http://msdn.microsoft.com/en-us/library/windows/desktop/ms683614%28v=vs.85%29.aspx

The structure I would suggest is something like this so you can actually achieve what you want, and do it quickly: 我建议的结构是这样的,因此您实际上可以实现所需的内容,并快速完成:

// at the very start, the number of buffers you might have total
uint32_t   m_size;    // if you know the max. number maybe use a const instead...

// then m_size structures, one per buffer:
uint32_t   m_offset0;  // offset to your data
uint32_t   m_size0;    // size of that buffer
uint32_t   m_busy0;    // whether someone is working on the buffer
uint32_t   m_offset1;
uint32_t   m_size1;
uint32_t   m_busy1;
...
uint32_t   m_offsetN;
uint32_t   m_sizeN;
uint32_t   m_busyN;

With the offset and size you gain direct access to any buffer in your mapped area. 使用偏移量和大小,您可以直接访问映射区域中的任何缓冲区。 To allocate a buffer, you probably want to implement something similar to what malloc() does, although all the necessary info is found in this table right here, so no need for chained lists, etc. However, if you are to free some buffers, you'll need to keep track of its size. 要分配缓冲区,您可能想要实现类似于malloc()的操作,尽管在此表中可以找到所有必需的信息,因此不需要链表等。但是,如果要释放一些缓冲区,您需要跟踪其大小。 And if you allocate/free all the time, you'll have fun with fragmentation. 而且,如果您一直都在分配/释放资源,那么碎片化将很有趣。 Anyway... 无论如何...

Another way is to make use of a ring buffer (a "pipe" in essence), so you always allocate after the last buffer and if not enough room there, allocate at the very start, closing N buffers as required by the new buffer size requirement... This would probably be easier to implement. 另一种方法是利用环形缓冲区(本质上是一个“管道”),因此您总是在最后一个缓冲区之后分配,如果那里没有足够的空间,则从头开始分配,根据新缓冲区的大小关闭N个缓冲区要求...这可能更容易实现。 However, that means you probably need to know where to start when looking for a buffer (ie have an index for what is currently considered the "first" [oldest] buffer, which will happen to be the next to be reused.) 但是,这意味着您在寻找缓冲区时可能需要知道从哪里开始(即,为当前被认为是“第一个” [最旧]缓冲区的索引,该索引将恰好是下一个要重用的缓冲区)。

But since you do not explain how a buffer becomes "old" and reusable (freed so it can be reused), I cannot really give you an exact implementation. 但是,由于您没有说明缓冲区如何变成“旧的”和可重复使用的(释放后可以重新使用),因此我无法真正为您提供确切的实现。 But something like the following would probably do it for you. 但是类似以下的内容可能会为您做得到。

In the header structure, if m_offset is zero, then the buffer is not currently allocated and thus there is nothing to do with that entry. 在头结构中,如果m_offset为零,则当前未分配缓冲区,因此与该条目无关。 If m_busy is zero, no process is accessing that buffer. 如果m_busy为零,则没有进程正在访问该缓冲区。 I also present an m_free field which can be 0 or 1. The writer would set that parameter to 1 whenever it needs more buffers to save the data it just received. 我还介绍了一个m_free字段,该字段可以为0或1。只要需要更多缓冲区来保存刚接收到的数据,编写器就会将该参数设置为1。 I don't go too deep with that one as, again, I do not exactly know how you free your buffers. 我不会太深,因为我也不完全知道如何释放缓冲区。 It is not required if you never free the buffers also. 如果您也从不释放缓冲区,则不需要它。

0) Structures 0)结构

// only if the size varies between runs, otherwise use a constant like:
// namespace { uint32_t const COUNT = 123; }
struct header_count_t
{
    uint32_t    m_size;
};

struct header_t
{
    uint32_t    m_offset;
    uint32_t    m_size;
    uint32_t    m_busy;  // to use with Interlocked...() you may want to use LONG instead
};

// and from your "ptr" you'd do:
header_count_t *header_count = (header_count_t *) ptr;
header_count->m_size = ...; // your dynamic size (if dynamic it needs to be)
header_t *header = (header_t *) (header_count + 1);
// first buffer will be at: data = (char *) (header + header_count->m_size)
for(size_t n(0); n < header_count->m_size; ++n)
{
   // do work (see below) on header[n]
   ...
}

1) The writer to access the data must first lock the buffer, if not available, try again with the next one; 1)写者访问数据必须首先锁定缓冲区,如果不可用,请尝试下一个缓冲区; locking is done with InterlockedIncrement() and unlocking with InterlockedDecrement() : 锁定通过InterlockedIncrement()完成,并通过InterlockedDecrement()解锁:

InterlockedIncrement(&header[n]->m_busy);
if(header[n]->m_offset == nullptr)
{
     // buffer not allocated yet, allocate now and copy data,
     // but do not save the offset until "much" later
     uint32_t offset = malloc_buffer();
     memcpy(ptr + offset, source_data, size);
     header[n]->m_size = size;

     // extra memory barrier to make sure that the data copied
     // in the buffer is all there before we save the offset
     InterlockedIncrement(&header[n]->m_busy);
     header[n]->m_offset = offset;
     InterlockedDecrement(&header[n]->m_busy);
}
InterlockedDecrement(&header[n]->m_busy);

Now this won't be enough if you want to be able to free a buffer. 现在,如果您希望能够释放缓冲区,这还远远不够。 In that case, another flag is necessary to prevent other processes from reusing an old buffer. 在这种情况下,必须使用另一个标志来防止其他进程重用旧缓冲区。 Again that will depend on your implementation... (see example below.) 同样,这取决于您的实现...(请参见下面的示例。)

2) A reader to access the data must first lock the buffer with an InterlockedIncrement() once done with the buffer, it needs to release the buffer with InterlockedDecrement() . 2)读取器必须先使用InterlockedIncrement()锁定缓冲区,然后才能使用InterlockedDecrement()释放缓冲区。 Note that the lock applies even when the m_offset is a nullptr. 请注意,即使m_offset为nullptr,该锁也适用。

InterlockedIncrement(&header[n]->m_busy);
if(header[n]->m_offset)
{
    // do something with the buffer
    uint32_t size(header[n]->m_size);
    char const *buffer_ptr = ptr + header[n]->m_offset;
    ...
}
InterlockedDecrement(header[n]->m_busy);

So here I just test whether m_offset is set. 所以在这里我只是测试是否设置了m_offset。

3) If you want to be able to free a buffer, you also need to test another flag (see below), if that other flag is true (or false) then the buffer is about to be freed (as soon as all processes released it) and that flag can then be used in the previous code snippet (ie either m_offset is zero, or that flag is 1 and the m_busy counter is exactly 1.) 3)如果要释放缓冲区,还需要测试另一个标志(请参见下文),如果另一个标志为true(或false),则即将释放缓冲区(所有进程释放后)它),然后可以在先前的代码段中使用该标志(即m_offset为零,或者该标志为1且m_busy计数器正好为1)。

Something like this for the writer: 对于作家来说,是这样的:

LONG lock = InterlockedIncrement(&header[n]->m_busy);
if(header[n]->m_offset == nullptr
|| (lock == 1 && header[n]->m_free == 1))
{
    // new buffer (nullptr) or reusing an old buffer

    // reset the offset first
    InterlockedIncrement(&header[n]->m_busy);
    header[n]->m_offset = nullptr;
    InterlockedDecrement(&header[n]->m_busy);
    // then clear m_free
    header[n]->m_free = 0;
    InterlockedIncrement(&header[n]->m_busy);  // WARNING: you need another Decrement against this one...

    // code as before (malloc_buffer, memcpy, save size & offset...)
    ...
}
InterlockedDecrement(&header[n]->m_busy);

And in the reader the test changes with: 在阅读器中,测试的变化如下:

if(header[n]->m_offset && header[n]->m_free == 0)

As a side note: all the Interlocked...() functions are full memory barriers (fences) so you're all good in that regard. 附带说明一下:所有Interlocked ...()函数都是完整的内存屏障(栅栏),因此您在这方面都很出色。 You have to use many of them to make sure that you get the right synching. 您必须使用它们中的许多来确保获得正确的同步。

Note that this is untested code... but if you want to avoid inter-process semaphores (which would probably not simplify this much), that's the way to go. 请注意,这是未经测试的代码...但是,如果要避免进程间信号量(可能不会简化太多),这就是方法。 Note that the sleep() of 20ms in itself is not required, except to avoid one pegged CPU per reader, obviously. 请注意,除了避免每个读取器占用一个固定的CPU之外,本身不需要20ms的sleep()。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM