简体   繁体   English

mmap和C ++严格的别名规则

[英]mmap and C++ strict aliasing rules

Consider a POSIX.1-2008 compliant operating system, and let fd be a valid file descriptor (to an open file, read mode, enough data...). 考虑一个符合POSIX.1-2008的操作系统,并将fd设为有效的文件描述符(对于打开的文件,读取模式,足够的数据...)。 The following code adheres to the C++11 standard* (ignore error checking): 以下代码符合C ++ 11标准*(忽略错误检查):

void* map = mmap(NULL, sizeof(int)*10, PROT_READ, MAP_PRIVATE, fd, 0);
int* foo = static_cast<int*>(map);

Now, does the following instruction break strict aliasing rules? 现在,以下指令是否违反了严格的别名规则?

int bar = *foo;

According to the standard: 根据标准:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: 如果程序尝试通过以下类型之一以外的glvalue访问对象的存储值,则行为未定义:

  • the dynamic type of the object, 对象的动态类型,
  • a cv-qualified version of the dynamic type of the object, 对象动态类型的cv限定版本,
  • a type similar (as defined in 4.4) to the dynamic type of the object, 与对象的动态类型类似(定义见4.4)的类型,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, 类型是与对象的动态类型相对应的有符号或无符号类型,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, 一种类型,是与对象的动态类型的CV限定版本相对应的有符号或无符号类型,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union), 集合或联合类型,在其元素或非静态数据成员(递归地包括子集合或包含的联合的元素或非静态数据成员)中包括上述类型之一,
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, 该类型是对象动态类型的(可能是cv限定的)基类类型,
  • a char or unsigned char type. 字符或无符号字符类型。

What's the dynamic type of the object pointed by map / foo ? map / foo指向的对象的动态类型是什么? Is that even an object? 那甚至是物体吗? The standard says: 该标准说:

The lifetime of an object of type T begins when: storage with the proper alignment and size for type T is obtained, and if the object has non-trivial initialization, its initialization is complete. 类型T对象的生存期在以下情况下开始:获得具有类型T正确的对齐方式和大小的存储,并且如果该对象具有非平凡的初始化,则其初始化完成。

Does this mean that the mapped memory contains 10 int objects (suppose that the initial address is aligned)? 这是否意味着映射的内存包含10个int对象(假设初始地址已对齐)? But if it is true, wouldn't this apply also to this code (which clearly breaks strict aliasing)? 但是,如果这是真的,那么这是否也不适用于此代码(这显然会破坏严格的别名)?

char baz[sizeof(int)];
int* p=reinterpret_cast<int*>(&baz);
*p=5;

Even oddly, does that mean that declaring baz starts the lifetime of any (properly aligned) object of size 4? 甚至奇怪的是,这是否意味着声明baz开始了大小为4的任何对象(正确对齐)的生存期?


Some context: I am mmap-ing a file which contains a chunk of data which I wish to directly access. 一些情况:我正在映射一个文件,其中包含我希望直接访问的大量数据。 Since this chunk is large I'd like to avoid memcpy-ing to a temporary object. 由于此块很大,因此我想避免存储到临时对象。


*can nullptr be instead of NULL here, is it implicitly casted to NULL? *在这里可以将nullptr代替NULL,是否将其隐式转换为NULL? Any reference from the standard? 该标准有什么参考吗?

I believe simply casting does violate strict aliasing. 我相信简单的转换确实违反了严格的别名。 Arguing that convincingly is above my paygrade, so here is an attempt at a workaround: 认为令人信服地高于我的薪水,因此这是一种解决方法:

template<class T>
T* launder_raw_pod_at( void* ptr ) {
  static_assert( std::is_pod<T>::value, "this only works with plain old data" );
  char buff[sizeof(T)];
  std::memcpy( buff, ptr, sizeof(T) );
  T* r = ::new(ptr) T;
  std::memcpy( ptr, buff, sizeof(T) );
  return r;
}

I believe the above code has zero observable side effects on memory and returns a pointer to a legal T* at location ptr . 我相信上面的代码对内存的可观察到的副作用为零,并在位置ptr返回指向合法T*的指针。

Check if your compiler optimizes the above code to a noop. 检查您的编译器是否将上述代码优化为noop。 To do so, it has to understand memcpy at a really fundamental level, and constructing a T has to do nothing to the memory there. 为此,它必须从真正的基础上理解 memcpy ,并且构造T对该内存没有任何作用。

At least clang 4.0.0 can optimize this operation away . 至少clang 4.0.0可以优化此操作

What we do is we first copy the bytes away . 我们要做的是首先将字节复制 Then we use placement new to create a T there . 然后我们使用new放置在此处创建一个T Finally, we copy the bytes back. 最后,我们将字节复制回去。

We have a legally created T with exactly the bytes we want in it. 我们有一个合法创建的T其中包含我们想要的字节。

But the copy away and back are to a local buffer, so it has no observable effect. 但是副本的复制和复制都将复制到本地缓冲区,因此它没有明显的效果。

The construction of the object, if a pod, doesn't have to touch bytes either; 对象的构造(如果是容器)也不必接触字节。 technically the bytes are undefined. 从技术上讲,字节是未定义的。 But compilers who are smart say "do nothing". 但是聪明的编译器会说“什么也不做”。

So the compiler can work out that all this manipulation can be skipped at runtime . 因此,编译器可以确定可以在运行时跳过所有这些操作。 At the same time, we have in the abstract machine properly created an object with the proper bytes at that location. 同时,我们在抽象机中正确创建了一个在该位置具有适当字节的对象。 (assuming it has valid alignment! But that isn't this code's problem.) (假设它具有有效的对齐方式!但这不是此代码的问题。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM