简体   繁体   English

严格的别名规则和 'char *' 指针

[英]Strict aliasing rule and 'char *' pointers

The accepted answer to What is the strict aliasing rule?什么是严格别名规则的公认答案? mentions that you can use char * to alias another type but not the other way.提到您可以使用char *来别名另一种类型,但不能使用其他方式。

It doesn't make sense to me — if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?这对我来说没有意义——如果我们有两个指针,一个是char *类型,另一个是struct something *类型,指向同一个位置,第一个怎么可能给第二个起别名,而第二个没有别名首先?

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?如果我们有两个指针,一个是char *类型,另一个是struct something *类型的指针指向同一个位置,第一个为第二个取别名而第二个不为第一个取别名怎么可能?

It does, but that's not the point.确实如此,但这不是重点。

The point is that if you have one or more struct something s then you may use a char* to read their constituent bytes, but if you have one or more char s then you may not use a struct something* to read them.关键是,如果你有一个或多个struct something ,那么你可以使用char*来读取它们的组成字节,但如果你有一个或多个char ,那么你可能不会使用struct something*来读取它们。

The wording in the referenced answer is slightly erroneous, so let's get that ironed out first: One object never aliases another object , but two pointers can “alias” the same object (meaning, the pointers point to the same memory location — as MM pointed out, this is still not 100% correct wording but you get the idea).引用答案中的措辞有点错误,所以让我们先解决这个问题:一个对象永远不会给另一个对象起别名,但是两个指针可以“别名”同一个对象(意思是,指针指向同一个内存位置 - 正如 MM 所指出的那样出来,这仍然不是 100% 正确的措辞,但你明白了)。 Also, the standard itself doesn't (to the best of my knowledge) actually talk about strict aliasing at all, but merely lays out rules that govern through which kinds of expressions an object may be accessed or not.此外,该标准本身(据我所知)实际上根本没有谈论严格的别名,而只是列出了一些规则来管理可以通过哪些类型的表达式访问对象。 Compiler flags like -fno-strict-aliasing tell the compiler whether it can assume the programmer followed those rules (so it can perform optimizations based on that assumption) or not. -fno-strict-aliasing之类的编译器标志告诉编译器是否可以假设程序员遵循这些规则(因此它可以基于该假设执行优化)。

Now to your question: Any object can be accessed through a pointer to char , but a char object (especially a char array) may not be accessed through most other pointer types.现在回答您的问题:可以通过指向char指针访问任何对象,但可能无法通过大多数其他指针类型访问char对象(尤其是 char 数组)。 Based on that, the compiler is required to make the following assumptions:基于此,编译器需要做出以下假设:

  1. If the type of the actual object itself is not known, both char* and T* pointers may always point to the same object (alias each other) — symmetric relationship .如果不知道实际对象本身的类型,则char*T*指针可能总是指向同一个对象(互为别名)——对称关系
  2. If types T1 and T2 are not “related” and neither is char , then T1* and T2* may never point to the same object — symmetric relationship .如果类型T1T2不“相关”并且char也不是,那么T1*T2*可能永远不会指向同一个对象——对称关系
  3. A char* pointer may point to a char object or an object of any type T . char*指针可以指向char对象任何类型的对象T
  4. A T* pointer may not point to a char object — a symmetric relationship . T*指针可能指向char对象——对称关系

I believe, the main rationale behind the asymmetric rules about accessing object through pointers is that a char array might not satisfy the alignment requirements of, eg, an int .我相信,关于通过指针访问对象不对称规则背后的主要理由是char数组可能不满足例如int的对齐要求。

So, even without compiler optimizations based on the strict aliasing rule, writing an int to the location of a 4-byte char array at addresses 0x1, 0x2, 0x3, 0x4, for instance, will — in the best case — result in poor performance and — in the worst case — access a different memory location, because the CPU instructions might ignore the lowest two address bits when writing a 4-byte value (so here this might result in a write to 0x0, 0x1, 0x2, and 0x3).因此,即使没有基于严格别名规则的编译器优化,例如,将int写入地址 0x1、0x2、0x3、0x4 处的 4 字节char数组的位置(在最好的情况下)也会导致性能下降并且——在最坏的情况下——访问不同的内存位置,因为 CPU 指令在写入 4 字节值时可能会忽略最低的两个地址位(所以这里可能会导致写入 0x0、0x1、0x2 和 0x3) .

Please also be aware that the meaning of “related” differs between C and C++, but that is not relevant to your question.另请注意,“相关”的含义在 C 和 C++ 之间有所不同,但这与您的问题无关。

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?如果我们有两个指针,一个是char *类型,另一个是struct something *类型的指针指向同一个位置,第一个为第二个取别名而第二个不为第一个取别名怎么可能?

Pointers don't alias each other;指针不会相互别名; that's sloppy use of language.那是对语言的草率使用。 Aliasing is when an lvalue is used to access an object of a different type.别名是指使用左值访问不同类型的对象 (Dereferencing a pointer gives an lvalue). (取消引用指针会给出一个左值)。

In your example, what's important is the type of the object being aliased.在您的示例中,重要的是被别名的对象的类型。 For a concrete example let's say that the object is a double .对于一个具体的例子,假设对象是一个double Accessing the double by dereferencing a char * pointing at the double is fine because the strict aliasing rule permits this.通过取消引用指向 double 的char *来访问double很好,因为严格的别名规则允许这样做。 However, accessing a double by dereferencing a struct something * is not permitted (unless, arguably, the struct starts with double !).但是,不允许通过取消引用结构来访问double精度struct something *是不允许的(除非可以说,该结构以double精度开头!)。

If the compiler is looking at a function which takes char * and struct something * , and it does not have available the information about the object being pointed to (this is actually unlikely as aliasing passes are done at a whole-program optimization stage);如果编译器正在查看一个接受char *struct something *的函数,并且它没有可用的有关所指向对象的信息(这实际上不太可能,因为别名传递是在整个程序优化阶段完成的); then it would have to allow for the possibility that the object might actually be a struct something * , so no optimization could be done inside this function.那么它必须考虑到该对象实际上可能是一个struct something * ,因此无法在此函数内部进行任何优化。

Many aspects of the C++ Standard are derived from the C Standard, which needs to be understood in the historical context when it was written. C++ 标准的许多方面都源自 C 标准,需要在编写时在历史背景中进行理解。 If the C Standard were being written to describe a new language which included type-based aliasing, rather than describing an existing language which was designed around the idea that accesses to lvalues were accesses to bit patterns stored in memory, there would be no reason to give any kind of privileged status to the type used for storing characters in a string.如果编写 C 标准是为了描述一种包括基于类型的别名的新语言,而不是描述一种现有的语言,这种语言是围绕访问左值是访问存储在内存中的位模式这一思想而设计的,那么就没有理由为用于在字符串中存储字符的类型赋予任何类型的特权状态。 Having explicit operations to treat regions of storage as bit patterns would allow optimizations to be simultaneously more effective and safer.具有将存储区域视为位模式的显式操作将允许优化同时更有效和更安全。 Had the C Standard been written in such fashion, the C++ Standard presumably would have been likewise.如果 C 标准是以这种方式编写的,那么 C++ 标准大概也是如此。

As it is, however, the Standard was written to describe a language in which a very common idiom was to copy the values of objects by copying all of the bytes thereof, and the authors of the Standard wanted to allow such constructs to be usable within portable programs.然而,事实上,编写标准是为了描述一种语言,其中一个非常常见的习语是通过复制对象的所有字节来复制对象的值,并且标准的作者希望允许这样的结构在便携式程序。

Further, the authors of the Standard intended that implementations process many non-portable constructs "in a documented manner characteristic of the environment" in cases where doing so would be useful, but waived jurisdiction over when that should happen, since compiler writers were expected to understand their customers' and prospective customers' needs far better than the Committee ever could.此外,该标准的作者打算在可能有用的情况下“以环境的记录方式”处理许多不可移植的构造,但放弃了对何时应该发生的管辖权,因为编译器编写者被期望比委员会更了解他们的客户和潜在客户的需求。

Suppose that in one compilation unit, one has the function:假设在一个编译单元中,一个具有以下功能:

void copy_thing(char *dest, char *src, int size)
{
  while(size--)
    *(char volatile *)(dest++) = *(char volatile*)(src++);
}

and in another compilation unit:在另一个编译单元中:

float f1,f2;
float test(void)
{
  f1 = 1.0f;
  f2 = 2.0f;
  copy_thing((char*)&f2, (char*)&f1, sizeof f1);
  return f2;
}

I think there would have been a consensus among Committee members that no quality implementation should treat the fact that copy_thing never writes to an object of type float as an invitation to assume that the return value will always be 2.0f.我认为委员会成员之间会达成共识,即任何质量实现都不应该将 copy_thing 从不写入float类型的对象这一事实视为假设返回值将始终为 2.0f 的邀请。 There are many things about the above code that should prevent or discourage an implementation from consolidating the read of f2 with the preceding write, with or without a special rule regarding character types, but different implementations would have different reasons for their forfearance.上面的代码有很多地方应该阻止或阻止实现将f2的读取与前面的写入合并,无论是否有关于字符类型的特殊规则,但不同的实现会有不同的理由来放弃它们。

It would be difficult to describe a set of rules which would require that all implementations process the above code correctly without blocking some existing or plausible implementations from implementing what would otherwise be useful optimizations.很难描述一组规则,这些规则要求所有实现都正确处理上述代码,而不会阻止一些现有或合理的实现实现原本有用的优化。 An implementation that treated all inter-module calls as opaque would handle such code correctly even if it was oblivious to the fact that a cast from T1 to T2 is a sign that an access to a T2 may affect a T1, or the fact that a volatile access might affect other objects in ways a compiler shouldn't expect to understand.将所有模块间调用视为不透明的实现将正确处理此类代码,即使它忽略了从 T1 到 T2 的强制转换是对 T2 的访问可能影响 T1 的标志,或者一个事实volatile 访问可能会以编译器不希望理解的方式影响其他对象。 An implementation that performed cross-module in-lining and was oblivious to the implications of typecasts or volatile would process such code correctly if it refrained from making any aliasing assumptions about accesses via character pointers.执行跨模块内联并且忽略类型转换或易失性的影响的实现如果避免对通过字符指针的访问进行任何别名假设,则将正确处理此类代码。

The Committee wanted to recognize something in the above construct that compilers would be required to recognize as implying that f2 might be modified, since the alternative would be to view such a construct as Undefined Behavior despite the fact that it should be usable within portable programs.委员会希望认识到上述构造中的某些内容,编译器将被要求识别为暗示f2可能被修改,因为替代方案是将这样的构造视为未定义行为,尽管事实上它应该在可移植程序中可用。 The fact that they chose the fact that the access was made via character pointer was the aspect that forced the issue was never intended to imply that compilers be oblivious to everything else, even though unfortunately some compiler writers interpret the Standard as an invitation to do just that.他们选择通过字符指针进行访问这一事实是迫使该问题的方面,这并不意味着编译器忽略其他所有事情,尽管不幸的是,一些编译器编写者将标准解释为邀请去做只是那。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM