为什么 GCC 删除了我在 O3 上的代码，而不是在 O0 上？

Question

Recently i have been trying to learn about rvalues and perfect forwarding.最近我一直在尝试学习右值和完美转发。 While playing around with some constructs i came across some peculiar behavior while switching compilers and optimization levels.在玩弄一些结构时，我在切换编译器和优化级别时遇到了一些特殊的行为。

Compiling the same code on GCC with no optimization turned on would yield expected results, however turning on any optimization level then would result all of my code being deleted.在 GCC 上编译相同的代码而不打开优化会产生预期的结果，但是打开任何优化级别都会导致我的所有代码被删除。 Compiling the same code on clang with no optimizations also yields expected results.在没有优化的情况下在 clang 上编译相同的代码也会产生预期的结果。 Then turning on optimizations on clang would still yield expected results.然后在 clang 上打开优化仍然会产生预期的结果。

I know this screams undefined behavior but i just cannot figure out what exactly is going wrong and whats causing the discrepancy between the two compilers.我知道这会引起未定义的行为，但我只是无法弄清楚到底出了什么问题以及是什么导致了两个编译器之间的差异。

gcc -O0 -std=c++17 -Wall -Wextra

https://godbolt.org/z/5xY1Gz https://godbolt.org/z/5xY1Gz

gcc -O3 -std=c++17 -Wall -Wextra

https://godbolt.org/z/fE3TE5 https://godbolt.org/z/fE3TE5

clang -O0 -std=c++17 -Wall -Wextra

https://godbolt.org/z/W98fh8 https://godbolt.org/z/W98fh8

clang -O3 -std=c++17 -Wall -Wextra

https://godbolt.org/z/6sEo8j https://godbolt.org/z/6sEo8j

#include <utility>

// lambda_t is the type of thing we want to call.
// capture_t is the type of a helper object that 
// contains all all parameters meant to be passed to the callable
template< class lambda_t, class capture_t >
struct CallObject {

    lambda_t  m_lambda;
    capture_t m_args;

    typedef decltype( m_args(m_lambda) ) return_t;

    //Construct the CallObject by perfect forwarding which is
    //neccessary as they may these are lambda which will have
    //captured objects and we dont want uneccessary copies
    //while passing these around
    CallObject( lambda_t&& p_lambda, capture_t&& p_args ) :
        m_lambda{ std::forward<lambda_t>(p_lambda) },
        m_args  { std::forward<capture_t>(p_args) }
    {

    }

    //Applies the arguments captured in m_args to the thing
    //we actually want to call
    return_t invoke() {
        return m_args(m_lambda);
    }

    //Deleting special members for testing purposes
    CallObject() = delete;
    CallObject( const CallObject& ) = delete;
    CallObject( CallObject&& ) = delete;
    CallObject& operator=( const CallObject& ) = delete;
    CallObject& operator=( CallObject&& ) = delete;
};

//Factory helper function that is needed to create a helper
//object that contains all the paremeters required for the 
//callable. Aswell as for helping to properly templatize
//the CallObject
template< class lambda_t, class ... Tn >
auto Factory( lambda_t&& p_lambda, Tn&& ... p_argn ){

    //Using a lambda as helper object to contain all the required paramters for the callable
    //This conviently allows for storing value, references and so on
    auto x = [&p_argn...]( lambda_t& pp_lambda ) mutable -> decltype(auto) {

        return pp_lambda( std::forward<decltype(p_argn)>(p_argn) ... );
    };

    typedef decltype(x) xt;
    //explicit templetization is not needed in this case but
    //for the sake of readability it needed here since we then
    //need to forward the lambda that captures the arguments
    return CallObject< lambda_t, xt >( std::forward<lambda_t>(p_lambda), std::forward<xt>(x) );
}

int main(){

    auto xx = Factory( []( int a, int b ){

        return a+b;

    }, 10, 3 );

    int q = xx.invoke();

    return q;
}

Answer 1

If something like this happens, it's ususally because you have undefined behavior somewhere in your program.如果发生这样的事情，通常是因为您在程序的某个地方有未定义的行为。 The compiler did detect this and when optimizing aggressively will throw away your entire program as a result.编译器确实检测到了这一点，并且在积极优化时会因此丢弃整个程序。

In your concrete example, you already get a hint that something is not quite right in the form of a compiler warning:在您的具体示例中，您已经以编译器警告的形式得到了一些不太正确的提示：

<source>: In function 'int main()':
<source>:45:18: warning: '<anonymous>' is used uninitialized [-Wuninitialized]
   45 |         return a+b;
      |                  ^

How could this happen?这怎么可能发生？ What could lead to b being uninitialized at this point?什么可能导致b此时未初始化？

Since b is a function parameter at this point, the problem must lie with the caller of that lambda.由于此时b是 function 参数，因此问题必须出在该 lambda 的调用者身上。 Examining the call site we notice something fishy:检查调用站点，我们注意到一些可疑的地方：

auto x = [&p_argn...]( lambda_t& pp_lambda ) mutable -> decltype(auto) {
    return pp_lambda( std::forward<decltype(p_argn)>(p_argn) ... );
};

The argument bound to b is passed as a parameter pack p_argn .绑定到b的参数作为参数包p_argn 。 But notice the lifetime of that parameter pack: It's captured by reference, So there is no perfect forwarding going on here, despite the fact that you wrote std::forward in the lambda body, because you capture by reference in the lambda, and the lambda does not "see" what happens outside its body in the surrounding function.但是请注意该参数包的生命周期：它是通过引用捕获的，因此尽管您在 lambda 正文中编写了std::forward ，但这里没有完美的转发，因为您在 lambda 中通过引用捕获，并且lambda 没有“看到”在其体外在周围 function 中发生的事情。 You get the same lifetime problem with a here too, but for some reason, the compiler chooses not to complain about that one.你a这里也会遇到同样的生命周期问题，但由于某种原因，编译器选择不抱怨那个。 That's undefined behavior for you, there's no guarantee you'll get a warning for it.这对您来说是未定义的行为，无法保证您会收到警告。 The quickest way to fix this is to just capture the arguments by value.解决此问题的最快方法是按值捕获 arguments。 You can retain the perfect forwarding property by using a named capture, with the somewhat peculiar syntax:您可以使用命名捕获保留完美的转发属性，语法有些特殊：

auto x = [...p_argn = std::forward<decltype(p_argn)>(p_argn)]( lambda_t& pp_lambda ) mutable -> decltype(auto) {
    return pp_lambda(std::move(p_argn)... );
};

Be sure you understand what is actually being stored where in this case, maybe even draw a picture.确保您了解在这种情况下实际存储的内容，甚至可以绘制图片。 It is vital to be able to know exactly where the individual objects live when writing code like this, otherwise it's very easy to write lifetime bugs like this.在编写这样的代码时，能够准确地知道各个对象所在的位置至关重要，否则很容易编写这样的终身错误。

Answer 2

Why does GCC delete my code on O3为什么 GCC 删除我在 O3 上的代码

Because GCC is very smart, figures out that your program doesn't depend on any runtime input, and thus optimises it to constant output at compile time.因为 GCC 非常聪明，可以确定您的程序不依赖于任何运行时输入，因此在编译时将其优化为常量 output。

just cannot figure out what exactly is going wrong and whats causing the discrepancy between the two compilers.只是无法弄清楚到底出了什么问题，以及是什么导致了两个编译器之间的差异。

The behaviour of the program is undefined.程序的行为是未定义的。 There is no reason to expect there to not be discrepancy between compilers, or any particular behaviour.没有理由期望编译器或任何特定行为之间没有差异。

The behaviour of the program is undefined.程序的行为是未定义的。

But why?但为什么？

Here:这里：

 auto xx = Factory(the_lambda, 10, 3);

You pass literals to the function, which are prvalues.您将文字传递给 function，它们是纯右值。

 auto Factory( lambda_t&& p_lambda, Tn&&... p_argn )

The function accepts them by reference. function 通过引用接受它们。 Therefore temporary objects are created, whose lifetime extends until the end of the full expression (which is longer than lifetime of the argument references, so the lifetime of the temporaries are not extended).因此创建了临时对象，其生命周期一直延伸到完整表达式的末尾（比参数引用的生命周期长，因此临时对象的生命周期不会延长）。

 auto x = [&p_argn...]( //...

The referred temporaries are stored in a lambda... by reference.引用的临时文件通过引用存储在 lambda... 中。 At no point is there an integer stored in a lambda. integer 绝不会存储在 lambda 中。

When you later call the lambda, those temporary objects that were referred no longer exist.当您稍后调用 lambda 时，那些被引用的临时对象不再存在。 Those non-existing objects are accessed outside their lifetime, and the behaviour of the program is undefined.那些不存在的对象在它们的生命周期之外被访问，并且程序的行为是未定义的。

Mistakes like this are the reason why std::thread , std::bind and similar which bind arguments always store a value rather than a reference.像这样的错误是std::thread 、 std::bind和类似的绑定 arguments 总是存储值而不是引用的原因。

Answer 3

... would yield expected results, however turning on any optimization level then would result all of my code being deleted. ...会产生预期的结果，但是打开任何优化级别都会导致我的所有代码都被删除。

The question is:问题是：

What exactly do you expect?你到底期待什么？

Most people do not expect a program to contain certain assembler code;大多数人并不期望程序包含某些汇编代码。 most people only expect executable programs (under Windows this would be the .exe file) to have a certain "black-box" behaviour:大多数人只期望可执行程序（在 Windows 下，这将是.exe文件）具有某种“黑盒”行为：

The program should print a certain text to the console, write to certain files, display certain windows in the GUI, print certain text on the printer, create certain network connections and so on.程序应该在控制台打印某些文本，写入某些文件，在 GUI 中显示某些 windows，在打印机上打印某些文本，创建某些网络连接等等。

The only "black-box" behaviour your program has is that it returns the exit code 0.您的程序唯一的“黑盒”行为是它返回退出代码 0。

This means that the best compiler optimization possible throws away everything which is not needed to return 0 as exit() code.这意味着最好的编译器优化可能会丢弃不需要将 0 作为exit()代码返回的所有内容。

... and this means that the following code remains on 32- and 64-bit x86 systems: ...这意味着以下代码保留在 32 位和 64 位 x86 系统上：

xor eax, eax
ret

And exactly this is what was done in the link you provided .这正是您提供的链接中所做的。

( EDIT ) （编辑）

Sorry, but I didn't read the following part of your question:抱歉，但我没有阅读您问题的以下部分：

I know this screams undefined behavior...我知道这会尖叫未定义的行为......

In this case this means:在这种情况下，这意味着：

The program not being optimized ( -O0 ) will return different values depending on the data being in the RAM before your program is started.未优化的程序 ( -O0 ) 将根据程序启动前 RAM 中的数据返回不同的值。

Depending on the OS you use this may be depending on the program that was running before your program.根据您使用的操作系统，这可能取决于在您的程序之前运行的程序。

Obviously the "black-box" behaviour of your (un-optimized) program may either be returning 0 or 13 as exit() code depending on the content of the RAM before starting the program.显然，您的（未优化的）程序的“黑盒”行为可能会返回 0 或 13 作为exit()代码，具体取决于启动程序之前 RAM 的内容。

Therefore the "best possible" compiler optimization may either simply return 0 or 13 as exit() code assuming that the RAM contains certain data before starting your program.因此，“最好的”编译器优化可能只是简单地返回 0 或 13 作为exit()代码，假设 RAM 在启动程序之前包含某些数据。

You might argue: "But my OS will set the RAM content to a certain value (eg 0) before the program is started."您可能会争辩说： “但我的操作系统会在程序启动之前将 RAM 内容设置为某个值（例如 0）。”

However, even in this case the exit() code still depends on the way how the (non-optimizing) compiler exactly translates the program.然而，即使在这种情况下， exit()代码仍然取决于（非优化）编译器如何准确地翻译程序。

Answer 4

You got some big hints from the compiler:你从编译器那里得到了一些重要的提示：

<source>: In function 'int main()':

<source>:45:18: warning: '<anonymous>' is used uninitialized in this function [-Wuninitialized]

   45 |         return a+b;

      |                  ^

<source>:45:18: warning: '<anonymous>' is used uninitialized in this function [-Wuninitialized]

ASM generation compiler returned: 0

The issue is that you capture your argument list (10, 3) by reference, but those are temporary values when captured.问题是您通过引用捕获参数列表 (10, 3)，但这些是捕获时的临时值。 If you either capture by value or pass actual variables, the code compiles without errors and I get the expected result.如果您按值捕获或传递实际变量，则代码编译不会出错，我会得到预期的结果。

The reason all your code is "deleted" is because gcc and clang are both smart enough to realize you're asking it to add two numbers together, so they've optimized almost your entire program away.您所有代码被“删除”的原因是因为 gcc 和 clang 都足够聪明，可以意识到您要求它将两个数字相加，因此它们几乎优化了您的整个程序。 The finally assembly looks like this: finally 程序集如下所示：

main:
        mov     eax, 13
        ret

为什么 GCC 删除了我在 O3 上的代码，而不是在 O0 上？

问题描述

4 个解决方案

解决方案1
4 2020-08-11 05:29:50

解决方案2
3 已采纳 2020-08-11 05:29:47

解决方案3
1 2020-08-11 05:29:00

解决方案4
0 2020-08-11 05:32:01

为什么 GCC 删除了我在 O3 上的代码，而不是在 O0 上？

问题描述

4 个解决方案

解决方案1 4 2020-08-11 05:29:50

解决方案2 3 已采纳 2020-08-11 05:29:47

解决方案3 1 2020-08-11 05:29:00

解决方案4 0 2020-08-11 05:32:01

解决方案1
4 2020-08-11 05:29:50

解决方案2
3 已采纳 2020-08-11 05:29:47

解决方案3
1 2020-08-11 05:29:00

解决方案4
0 2020-08-11 05:32:01