严格的混叠违规和分析

Question

I understand violations to the strict aliasing rule could lead to many troubles.我知道违反严格的别名规则可能会导致很多麻烦。 One of them is some kind of "register-cache/memory incoherence".其中之一是某种“寄存器缓存/内存不一致”。 The following code is an example:以下代码是一个示例：

#include <iostream>
#include <vector>
#include <cstring>
#include <numeric>


struct F { float x, y; };


int main()
{
  int input;
  std::cin >> input;
  std::cerr << "\n";
  int size = input;
  
  
  constexpr int cap = 4;
  int64_t z[cap]; 
  size = size > cap ? cap : size;
  std::iota(z, z + size, 0);
  
  
  std::cerr << "Initial z[] = ";
  for(int i = 0; i < size; ++i) std::cerr << z[i] << ", ";
  std::cerr << "\n\n";
  
  
  // Type punning, undefined behavior.
  F *f = (F*)z;
  for(int i = 0; i < size; ++i)
  {
    f[i].x = 3.14 * (i + 1);      
    f[i].y = 3.14 * (size - i);
  }
  
  
  std::cerr << "After writing floats z[] = ";
  for(int i = 0; i < size; ++i) std::cerr << z[i] << ", ";
  
  
  while(true);
  return 0;
}

Compile the code using gcc-8.3 (mingw64),使用 gcc-8.3 (mingw64) 编译代码，

system("C:/rtools40/mingw64/bin/g++.exe -std=gnu++17 -Ofast -o Ofast.exe tmp2.cpp")

on a 64-bit Windows 10, Intel i9-9980HK, this is the output:在 64 位 Windows 10、Intel i9-9980HK 上，这是 output：

After writing floats, z[2] is still 2, which is wrong but expected.写入浮点数后， z[2]仍然为 2，这是错误的，但也是意料之中的。

The Standard states that any char* can point to a memory block pointed by other pointers of all types, thus reads and writes through that char* must be acknowledged correctly.标准规定任何char*都可以指向由所有类型的其他指针指向的 memory 块，因此必须正确确认通过该char*进行的读取和写入。 So I modified the above code:于是我修改了上面的代码：

#include <iostream>
#include <vector>
#include <cstring>
#include <numeric>


struct F { float x, y; };


int main()
{
  int input;
  std::cin >> input;
  std::cerr << "\n";
  int size = input;
  
  
  constexpr int cap = 4;
  int64_t z[cap]; 
  size = size > cap ? cap : size;
  std::iota(z, z + size, 0);
  
  
  std::cerr << "Initial z[] = ";
  for(int i = 0; i < size; ++i) std::cerr << z[i] << ", ";
  std::cerr << "\n\n";
  
  
  // Undefined behavior.
  F *f = (F*)z;
  for(int i = 0; i < size; ++i)
  {
    f[i].x = 3.14 * (i + 1);      
    f[i].y = 3.14 * (size - i);
  }
  
  
  // The following branch is runtime dependent so cannot be pruned by an
  // aggressive but compliant compiler (ACC).
  // The goal is to make any ACC "fear" that z[]'s contents can be altered
  // via another char*.
  // This complies the strict aliasing rule since a char* can point to
  // any memory block pointed by another pointer of any type, and read / write
  // the memory thereafter.
  if(input > 2000000000) // Little chance to enter.
  {

    char dummy[cap * sizeof(int64_t)];
    int Nbytes = size * sizeof(int64_t);


    char *zchar = (char*)z; // Set a char* pointer to z[]
    std::memcpy(dummy, zchar, Nbytes);


    // Read and write zchar[], do some dummy arithmetics:
    for(int i = 0; i < Nbytes; ++i) zchar[i] &= zchar[Nbytes - 1 - i];
    int S = std::accumulate(zchar, zchar + Nbytes, 0);


    // Print it so this whole dummy thing cannot be pruned by ACC.
    std::cerr << "dummy result = " << S << "\n\n";


    // Recover.
    std::memcpy(zchar, dummy, Nbytes);
  }
  
  
  // Because the same memory block as z[] could be rewritten via a char*, 
  // the ACC has to achieve some sort of register-cache/memory coherence, 
  // thus the right output.
  std::cerr << "After writing floats z[] = ";
  for(int i = 0; i < size; ++i) std::cerr << z[i] << ", ";
  
  
  while(true);
  return 0;
}

And the result becomes right:结果变得正确：

I can even simplify the dummy branch to我什至可以将虚拟分支简化为

if(input > 2000000000) // Little chance to enter.
{
   int S = std::accumulate((char*)z, (char*)(z + size), int(0));
   std::cerr << "dummy result = " << S << "\n\n";
}

And it still produces the right result.它仍然产生正确的结果。

Is my rationale regarding the strict aliasing rule correct?我关于严格别名规则的理由是否正确？ Is the above a valid way to prevent the "register-cache/memory incoherence" issue that could come with type punning?以上是防止类型双关语可能带来的“寄存器缓存/内存不连贯”问题的有效方法吗？

Thanks!谢谢！

Answer 1

I am not sure what you have in mind when you say " register-cache/memory coherency ".当您说“寄存器缓存/内存一致性”时，我不确定您在想什么。

The issue with the aliasing violation is simply (as with all undefined behavior) that the optimizer may choose to rely on it never happening (because it is UB) to infer constraints on the program execution and to use that to produce optimized machine code that needs to be functional only under these constraints.别名违规的问题很简单（与所有未定义的行为一样）优化器可能会选择依赖它永远不会发生（因为它是 UB）来推断程序执行的约束并使用它来生成需要的优化机器代码只有在这些限制下才能发挥作用。

For example in your case here without the write through char* the optimizer could for example see that the writes through F* could not possibly modify z because that would be an aliasing violation.例如，在您没有通过char*写入的情况下，优化器可以例如看到通过F*的写入不可能修改z ，因为这将是一个别名违规。 Therefore it could eg reorder the writes through F* after the output.因此，它可以例如在 output 之后通过F*重新排序写入。

Alternatively the compiler may see that z first has values written to it that can be calculated at compile-time, which it may remember for optimization.或者，编译器可能会看到z首先写入了可以在编译时计算的值，它可能会记住这些值以进行优化。 Then it can ignore the writes through F* , which are impossible due to UB, and finally seeing the output it can simply choose to output constants corresponding to the values it "knows" are in z .然后它可以忽略通过F*进行的写入，由于 UB，这是不可能的，最后看到 output 它可以简单地选择 output 常量对应于它“知道”在z中的值。

Your method of protection might thwart some of these optimizations, but there will always be others that a compiler may choose to employ.您的保护方法可能会阻碍其中一些优化，但编译器可能会选择采用其他优化方法。

For example a compiler may recognize that the write through F* is UB and conclude from that the only possible value size can ever have is 0 , because the loop body may never be executed in a non-UB program.例如，编译器可能会识别出通过F*进行的写入是 UB 并由此得出结论，唯一可能的值size可能是0 ，因为循环体可能永远不会在非 UB 程序中执行。

It can then use that knowledge to optimize the whole program to:然后它可以使用这些知识来优化整个程序：

int main()
{
  int input;
  std::cin >> input;
  std::cerr << "\n";
  std::cerr << "Initial z[] = ";
  std::cerr << "\n\n";
  std::cerr << "After writing floats z[] = ";
  
  return 0;
}

With your method you can only hope that a compiler doesn't at some point get complex enough optimization capability to make this determination (and I am not sure that there isn't currently a compiler able and willing to do so.)使用您的方法，您只能希望编译器在某些时候不会获得足够复杂的优化能力来做出此决定（而且我不确定目前没有编译器能够并且愿意这样做。）

If you want to make use of constructs violating the aliasing rules, then you need to make sure that your compiler doesn't rely on it for optimization at all.如果您想使用违反别名规则的构造，那么您需要确保您的编译器根本不依赖它进行优化。 Compilers usually have a flag to tell them not to do so, eg -fno-strict-aliasing for GCC, although in your program it is not only the aliasing violation that is a problem, but it is also a problem that there isn't actually any F object or array of F objects on which you would be allowed to do pointer arithmetic or access members.编译器通常有一个标志告诉他们不要这样做，例如-fno-strict-aliasing用于 GCC，尽管在您的程序中，不仅是别名冲突是一个问题，而且它也是一个问题，没有实际上任何F object 或F对象数组，您将被允许在其上进行指针运算或访问成员。 I am not sure that GCC's -fno-strict-aliasing flag is generally sufficient to guarantee that it wont rely on these two kinds of UB for optimization.我不确定 GCC 的-fno-strict-aliasing标志通常是否足以保证它不会依赖这两种 UB 进行优化。

As a side note: while(true);附带说明： while(true); is also undefined behavior in C++ (but not C).在 C++（但不是 C）中也是未定义的行为。 You cannot have an infinite loop without IO, atomic or volatile operations.如果没有 IO、原子或易失性操作，就不可能有无限循环。

For example, as long as the loop is there Clang 13 with -O3 on compiler explorer outputs:例如，只要循环存在 Clang 13 和-O3编译器资源管理器输出：

Initial z[] = 0, 1, 2, 

After writing floats z[] = 4690138725358302659, 4668251232723858883, 4632222435709990994, 4210784, 1, 12884901889, 2, 4200237, 140113662963656, 4200160, 0, 4200160, 0, 0, 0, 140113661087923, 140113662942080, 140736215914984, 4295040000, 4198848, 4200160, -8494071215197471174, 4198608, 140736215914976, 0, 0, 8494236860351561274, 8413707913860136506, 0, 0, 0, 1, 140736215914984, 140736215915000, 140113666830736, 0, 0, 4198608, 140736215914976, 0, 0, 4198654, 140736215914968, 28, 1, 140736215916259, 0, 140736215916270, 140736215916410, 140736215916436, 140736215916463, 140736215916489, 140736215916515, 0, 33, 140736217014272, 16, 529267711, 6, 4096, 17, 100, 3, 4194368, 4, 56, 5, 11, 7, 140113666637824, 8, 0, 9, 4198608, 11, 0, 12, 0, 13, 0, 14, 0, 23, 0, 25, 140736215915385, 26, 2, 31, 140736215916525, 15, 140736215915401, 0, 0, 0, -5663448171308038656, 2109109041276691562, 14696481348417631, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8391735729100685312, 4921308987509732720, 6436278639021083743, 8011689642975907935, 7597692896546878576, 7813877729437312364, 7020094974597624431, 3328210922030065518, 8011686456465305392, 7597692896546878576, 7813877729437312364, 7020094974597624431, 3328210922030065518, 4193470700803862320, 7885630528017166127, 8675390226550253936, 7147056913697434736, 3329058620635505004, 3918810539134823984, 5719376094260428852, 7150963379136975952, 8605359904538979439, 4707178968379521377, 5642809484591980366, 4427723895174544723, 5548561706083904609, 5283936564644036947, 8028914707716066895, 8320788952091016562, 5786948835902442496, 8026326388909754708, 7023201308806115180, 4415020012612383609, 8011672841536692527, 32420700043113589, 0,

and I don't know whether compiler explorer is truncating the output: https://godbolt.org/z/bcq5vKonG而且我不知道编译器资源管理器是否截断 output: https://godbolt.org/z/bcq5vKonG

严格的混叠违规和分析

问题描述

1 个解决方案

解决方案1
2 2022-01-22 09:54:07

严格的混叠违规和分析

问题描述

1 个解决方案

解决方案1 2 2022-01-22 09:54:07

解决方案1
2 2022-01-22 09:54:07