我是否需要将类型设为 POD 以使用内存映射文件持久化它？

Question

Pointers cannot be persisted directly to file, because they point to absolute addresses.指针不能直接保存到文件，因为它们指向绝对地址。 To address this issue I wrote a relative_ptr template that holds an offset instead of an absolute address.为了解决这个问题，我编写了一个relative_ptr模板，它包含一个偏移量而不是一个绝对地址。

Based on the fact that only trivially copyable types can be safely copied bit-by-bit, I made the assumption that this type needed to be trivially copyable to be safely persisted in a memory-mapped file and retrieved later on.基于只有简单可复制的类型才能安全地逐位复制这一事实，我假设这种类型需要可简单复制才能安全地保存在内存映射文件中并在以后检索。

This restriction turned out to be a bit problematic, because the compiler generated copy constructor does not behave in a meaningful way.这个限制结果有点问题，因为编译器生成的复制构造函数没有以有意义的方式运行。 I found nothing that forbid me from defaulting the copy constructor and making it private, so I made it private to avoid accidental copies that would lead to undefined behaviour.我发现没有任何东西禁止我默认复制构造函数并将其设为私有，因此我将其设为私有以避免会导致未定义行为的意外复制。

Later on, I found boost::interprocess::offset_ptr whose creation was driven by the same needs.后来，我发现boost::interprocess::offset_ptr的创建也是boost::interprocess::offset_ptr同样的需求。 However, it turns out that offset_ptr is not trivially copyable because it implements its own custom copy constructor.然而，事实证明offset_ptr不是简单的可复制的，因为它实现了自己的自定义复制构造函数。

Is my assumption that the smart pointer needs to be trivially copyable to be persisted safely wrong?我的假设是智能指针需要简单地复制才能安全地持久化吗？

If there's no such restriction, I wonder if I can safely do the following as well.如果没有这样的限制，我想知道我是否也可以安全地执行以下操作。 If not, exactly what are the requirements a type must fulfill to be usable in the scenario I described above?如果不是，那么类型必须满足哪些要求才能在我上面描述的场景中使用？

struct base {
    int x;
    virtual void f() = 0;
    virtual ~base() {} // virtual members!
};

struct derived : virtual base {
    int x;
    void f() { std::cout << x; }
};

using namespace boost::interprocess;

void persist() {
    file_mapping file("blah");
    mapped_region region(file, read_write, 128, sizeof(derived));
    // create object on a memory-mapped file
    derived* d = new (region.get_address()) derived();
    d.x = 42;
    d->f();
    region.flush();
}

void retrieve() {
    file_mapping file("blah");
    mapped_region region(file, read_write, 128, sizeof(derived));
    derived* d = region.get_address();
    d->f();
}

int main() {
    persist();
    retrieve();
}

Thanks to all those that provided alternatives.感谢所有提供替代方案的人。 It's unlikely that I will be using something else any time soon, because as I explained, I already have a working solution.我不太可能很快会使用其他东西，因为正如我所解释的，我已经有了一个可行的解决方案。 And as you can see from the use of question marks above, I'm really interested in knowing why Boost can get away without a trivially copyable type, and how far can you go with it: it's quite obvious that classes with virtual members will not work, but where do you draw the line?正如你从上面问号的使用所看到的，我真的很想知道为什么 Boost 可以在没有简单可复制的类型的情况下逃脱，以及你能走多远：很明显，具有虚拟成员的类不会工作，但你在哪里划线？

Answer 1

To avoid confusion let me restate the problem.为了避免混淆，让我重述这个问题。

You want to create an object in mapped memory in such a way that after the application is closed and reopened the file can be mapped once again and object used without further deserialization.您希望在映射内存中创建一个对象，以便在应用程序关闭并重新打开后，可以再次映射文件并使用对象，而无需进一步反序列化。

POD is kind of a red herring for what you are trying to do. POD 是您尝试做的事情的一种红鲱鱼。 You don't need to be binary copyable (what POD means);你不需要是二进制可复制的（POD 是什么意思）； you need to be address-independent.您需要独立于地址。

Address-independence requires you to:地址独立性要求您：

avoid all absolute pointers.避免使用所有绝对指针。
only use offset pointers to addresses within the mapped memory.仅使用指向映射内存中地址的偏移指针。

There are a few correlaries that follow from these rules.这些规则有一些相关性。

You can't use virtual anything.你不能使用virtual任何东西。 C++ virtual functions are implemented with a hidden vtable pointer in the class instance. C++ 虚函数是通过类实例中隐藏的 vtable 指针实现的。 The vtable pointer is an absolute pointer over which you don't have any control. vtable 指针是一个绝对指针，您无法对其进行任何控制。
You need to be very careful about the other C++ objects your address-independent objects use.您需要非常小心您的地址无关对象使用的其他 C++ 对象。 Basically everything in the standard library may break if you use them.如果您使用它们，基本上标准库中的所有内容都可能会损坏。 Even if they don't use new they may use virtual functions internally, or just store the address of a pointer.即使他们不使用new他们也可能在内部使用虚函数，或者只是存储一个指针的地址。
You can't store references in the address-independent objects.您不能在地址无关的对象中存储引用。 Reference members are just syntactic sugar over absolute pointers.引用成员只是绝对指针上的语法糖。

Inheritance is still possible but of limited usefulness since virtual is outlawed.继承仍然是可能的，但用途有限，因为虚拟是非法的。

Any and all constructors / destructors are fine as long as the above rules are followed.只要遵循上述规则，任何和所有构造函数/析构函数都可以。

Even Boost.Interprocess isn't a perfect fit for what you're trying to do.甚至 Boost.Interprocess 也不是您想要做的事情的完美选择。 Boost.Interprocess also needs to manage shared access to the objects, whereas you can assume that you're only one messing with the memory. Boost.Interprocess 还需要管理对对象的共享访问，而您可以假设您只是在处理内存。

In the end it may be simpler / saner to just use Google Protobufs and conventional serialization.最后，只使用Google Protobufs和传统的序列化可能更简单/更明智。

Answer 2

Yes, but for reasons other than the ones that seem to concern you.是的，但不是出于与您有关的原因。

You've got virtual functions and a virtual base class.你有虚函数和一个虚基类。 These lead to a host of pointers created behind your back by the compiler.这些会导致编译器在背后创建大量指针。 You can't turn them into offsets or anything else.你不能把它们变成偏移量或其他任何东西。

If you want to do this style of persistence, you need to eschew 'virtual'.如果你想做这种坚持的风格，你需要避开“虚拟”。 After that, it's all a matter of the semantics.在那之后，这一切都是语义问题。 Really, just pretend you were doing this in C.真的，就假装你是用 C 做的。

Answer 3

Even PoD has pitfalls if you are interested in interoperating across different systems or across time.如果您对跨不同系统或跨时间的互操作感兴趣，即使是 PoD 也有缺陷。

You might look at Google Protocol Buffers for a way to do this in a portable fashion.您可以查看Google Protocol Buffers以寻找一种以便携方式执行此操作的方法。

Answer 4

Not as much an answer as a comment that grew too big:与其说是评论变得太大，不如说是答案：

I think it's going to depend on how much safety you're willing to trade for speed/ease of usage.我认为这将取决于您愿意以多大的安全性来换取速度/易用性。 In the case where you have a struct like this:如果您有这样的struct ：

struct S { char c; double d; };

You have to consider padding and the fact that some architectures might not allow you to access a double unless it is aligned on a proper memory address.您必须考虑填充以及某些架构可能不允许您访问double的事实，除非它在正确的内存地址上对齐。 Adding accessor functions and fixing the padding tackles this and the structure is still memcpy -able, but now we're entering territory where we're not really gaining much of a benefit from using a memory mapped file.添加访问器函数并修复填充解决了这个问题，并且结构仍然是memcpy能力的，但是现在我们进入了一个领域，我们并没有真正从使用内存映射文件中获得太多好处。

Since it seems like you'll only be using this locally and in a fixed setup, relaxing the requirements a little seems OK, so we're back to using the above struct normally.由于您似乎只会在本地和固定设置中使用它，因此稍微放宽要求似乎没问题，所以我们又回到正常使用上述struct 。 Now does the function have to be trivially copyable?现在该函数是否必须是可简单复制的？ I don't necessarily think so, consider this (probably broken) class:我不一定这么认为，考虑一下这个（可能坏了）的课：

   1 #include <iostream>
   2 #include <utility>
   3 
   4 enum Endian { LittleEndian, BigEndian };
   5 template<typename T, Endian e> struct PV {
   6         union {
   7                 unsigned char b[sizeof(T)];
   8                 T x;
   9         } val;  
  10         
  11         template<Endian oe> PV& operator=(const PV<T,oe>& rhs) {
  12                 val.x = rhs.val.x;
  13                 if (e != oe) {
  14                         for(size_t b = 0; b < sizeof(T) / 2; b++) {
  15                                 std::swap(val.b[sizeof(T)-1-b], val.b[b]);
  16                         }       
  17                 }       
  18                 return *this;
  19         }       
  20 };

It's not trivially copyable and you can't just use memcpy to move it around in general, but I don't see anything immediately wrong with using a class like this in the context of a memory mapped file (especially not if the file matches the native byte order).它不是微不足道的可复制的，通常您不能只使用memcpy来移动它，但是我认为在内存映射文件的上下文中使用这样的类并没有立即出现任何错误（尤其是如果文件与本机字节顺序）。

Update:更新：
Where do you draw the line?你在哪里画线？

I think a decent rule of thumb is: if the equivalent C code is acceptable and C++ is just being used as a convenience, to enforce type-safety, or proper access it should be fine.我认为一个不错的经验法则是：如果等效的 C 代码是可以接受的，而 C++ 只是被用作一种便利、强制类型安全或适当的访问，那么它应该没问题。

That would make boost::interprocess::offset_ptr OK since it's just a helpful wrapper around a ptrdiff_t with special semantic rules.这将使boost::interprocess::offset_ptr正常，因为它只是一个有用的包装器，围绕具有特殊语义规则的ptrdiff_t 。 In the same vein struct PV above would be OK as it's just meant to byte swap automatically, though like in C you have to be careful to keep track of the byte order and assume that the structure can be trivially copied.同样，上面的struct PV也可以，因为它只是为了自动交换字节，尽管像在 C 中一样，您必须小心跟踪字节顺序并假设可以简单地复制结构。 Virtual functions wouldn't be OK as the C equivalent, function pointers in the structure, wouldn't work.虚函数不会像 C 等价物一样，结构中的函数指针不起作用。 However something like the following (untested) code would again be OK:但是，类似以下（未经测试的）代码再次可以：

struct Foo { 
    unsigned char obj_type;
    void vfunc1(int arg0) { vtables[obj_type].vfunc1(this, arg0); }
};

Answer 5

That is not going to work.那是行不通的。 Your class Derived is not a POD, therefore it depends on the compiler how it compiles your code.您的class Derived不是 POD，因此它取决于编译器如何编译您的代码。 In another words - do not do it.换句话说 - 不要这样做。

by the way, where are you releasing your objects?顺便说一下，你在哪里释放你的对象？ I see are creaing in-place your objects, but you are not calling destructor.我看到正在就地创建您的对象，但您没有调用析构函数。

Answer 6

Absolutely not.绝对不。 Serialisation is a well established functionality that is used in numerous of situations, and certainly does not require PODs.序列化是一种成熟的功能，可用于多种情况，当然不需要 POD。 What it does require is that you specify a well defined serialisation binary interface (SBI).它所需要的是您指定一个定义良好的序列化二进制接口 (SBI)。

Serialisation is needed anytime your objects leave the runtime environment, including shared memory, pipes, sockets, files, and many other persistence and communication mechanisms.任何时候您的对象离开运行时环境都需要序列化，包括共享内存、管道、套接字、文件和许多其他持久性和通信机制。

Where PODs help is where you know you are not leaving the processor architecture. POD 的帮助之处在于您知道您不会离开处理器架构。 If you will never be changing versions between writers of the object (serialisers) and readers (deserialisers) and you have no need for dynamically-sized data, then PODs allow easy memcpy based serialisers.如果您永远不会在对象的作者（序列化器）和读者（反序列化器）之间更改版本，并且您不需要动态大小的数据，那么 POD 允许基于 memcpy 的简单序列化器。

Commonly, though, you need to store things like strings.但是，通常您需要存储字符串之类的东西。 Then, you need a way to store and retrieve the dynamic information.然后，您需要一种方法来存储和检索动态信息。 Sometimes, 0 terminated strings are used, but that is pretty specific to strings, and doesn't work for vectors, maps, arrays, lists, etc. You will often see strings and other dynamic elements serialized as [size][element 1][element 2]… this is the Pascal array format.有时，使用 0 终止的字符串，但这非常特定于字符串，不适用于向量、映射、数组、列表等。您经常会看到字符串和其他动态元素序列化为 [size][element 1] [元素 2]……这是 Pascal 数组格式。 Additionally, when dealing with cross machine communications, the SBI must define integral formats to deal with potential endianness issues.此外，在处理跨机器通信时，SBI 必须定义完整格式以处理潜在的字节序问题。

Now, pointers are usually implemented by IDs, not offsets.现在，指针通常由 ID 实现，而不是偏移量。 Each object that needs to be serialise can be given an incrementing number as an ID, and that can be the first field in the SBI.每个需要序列化的对象都可以被赋予一个递增的编号作为 ID，这可以是 SBI 中的第一个字段。 The reason you usually don't use offsets is because you may not be able to easily calculate future offsets without going through a sizing step or a second pass.您通常不使用偏移量的原因是，如果不经过调整大小步骤或第二遍，您可能无法轻松计算未来的偏移量。 IDs can be calculated inside the serialisation routine on first pass.可以在第一遍的序列化例程中计算 ID。

Additional ways to serialize include text based serialisers using some syntax like XML or JSON.其他序列化方法包括使用 XML 或 JSON 等语法的基于文本的序列化程序。 These are parsed using standard textual tools that are used to reconstruct the object.这些是使用用于重建对象的标准文本工具来解析的。 These keep the SBI simple at the cost of pessimising performance and bandwidth.这些以降低性能和带宽为代价使 SBI 保持简单。

In the end, you typically build an architecture where you build serialisation streams that take your objects and translate them member by member to the format of your SBI.最后，您通常会构建一个体系结构，在其中构建序列化流，这些流接收对象并将它们逐个成员转换为 SBI 的格式。 In the case of shared memory, it typically pushes the members directly on to the memory after acquiring the shared mutex.在共享内存的情况下，它通常在获取共享互斥锁后将成员直接推送到内存中。

This often looks like这通常看起来像

void MyClass::Serialise(SerialisationStream & stream)
{
  stream & member1;
  stream & member2;
  stream & member3;
  // ...
}

where the & operator is overloaded for your different types.其中 & 运算符为您的不同类型重载。 You may take a look at boost.serialize for more examples.您可以查看 boost.serialize 以获取更多示例。

我是否需要将类型设为 POD 以使用内存映射文件持久化它？

问题描述

6 个解决方案

解决方案1
8 已采纳 2011-09-07 19:58:56

解决方案2
4 2011-09-04 18:07:02

解决方案3
2 2011-09-04 19:14:20

解决方案4
2 2011-09-04 21:11:06

解决方案5
1 2011-09-04 17:43:59

解决方案6
1 2011-09-04 20:26:24

我是否需要将类型设为 POD 以使用内存映射文件持久化它？

问题描述

6 个解决方案

解决方案1 8 已采纳 2011-09-07 19:58:56

解决方案2 4 2011-09-04 18:07:02

解决方案3 2 2011-09-04 19:14:20

解决方案4 2 2011-09-04 21:11:06

解决方案5 1 2011-09-04 17:43:59

解决方案6 1 2011-09-04 20:26:24

解决方案1
8 已采纳 2011-09-07 19:58:56

解决方案2
4 2011-09-04 18:07:02

解决方案3
2 2011-09-04 19:14:20

解决方案4
2 2011-09-04 21:11:06

解决方案5
1 2011-09-04 17:43:59

解决方案6
1 2011-09-04 20:26:24