g++ 优化：O2 标志修复了 O3 再次破坏的损坏代码

Question

This code, for matching a string in NFA, which I think requires O(N^2) memory, predictably breaks when string size is 20,000 , then works with -O2 compiled code, then breaks again for -O3 .这段代码用于匹配 NFA 中的字符串，我认为它需要O(N^2)内存，当字符串大小为20,000时可预见地中断，然后使用-O2编译代码，然后再次中断-O3 。 Compilation was done with -std=c++14 enabled.编译是在启用-std=c++14下完成的。 In my opinion, the problem is stack-overflow.在我看来，问题是堆栈溢出。

Input string was "ab" repeated 10,000 times, plus a 'c' at end.The image below contains the NFA I'm trying to match.输入字符串是"ab"重复10,000次，最后加上一个'c' 。下图包含我试图匹配的 NFA。

Specifically, my question is -具体来说，我的问题是-

1) What -O2 optimization is behind this,(which I believe is impressive) fix? 1) 这背后的-O2优化是什么，（我认为这是令人印象深刻的）修复？

2) And what -O3 optimization breaks it again? 2）什么-O3优化再次打破了它？

struct State
{
    map<char,vector<State*> > transitions;
    bool accepting = false;
};

bool match(State* state,string inp){
    if(inp=="") return state->accepting;

    for(auto s:state->transitions[inp[0]]) 
        if(match(s,inp.substr(1))) return true;

    for(auto s:state->transitions['|']) //e-transitions
        if(match(s,inp)) return true;

    return false;
}

In gcc documentation, it's said O3 has all optimizations of O2, plus some more.在 gcc 文档中，据说 O3 具有 O2 的所有优化，以及更多优化。 I couldn't "get" some of those extras or their relevance to this problem.And I want to emphasize, for what I've seen in similar questions, that I'm not looking for specific ways to fix this problem.我无法“获得”其中的一些额外内容或它们与此问题的相关性。而且我想强调，对于我在类似问题中看到的内容，我不是在寻找解决此问题的具体方法。

Answer 1

As you already have figured out: the problem is the stack-usage of your recursion.正如您已经发现的那样：问题在于递归的堆栈使用。 It is also true that TLO would not be performed neither for -O2 nor for -O3 (theoretically it would be possible only for the last recur-call which would not help in your case).确实，对于-O2和-O3都不会执行 TLO（理论上只有最后一次 recur-call 可能对您的情况没有帮助）。

However, depending on the level of the optimization your function needs different amount of space on the stack.但是，根据优化级别，您的函数在堆栈上需要不同的空间量。 There is no guarantee that -O3 version will be faster and need less space on the stack.不能保证-O3版本会更快并且需要更少的堆栈空间。

When we look at the assembly we can see the the following:当我们查看程序集时，我们可以看到以下内容：

-O3 reserves 88 bytes via subq $88, %rsp , the footprint on the stack is even larger because also registers r12-r15 are pushed on the stack in addition to the usual function prologue. -O3通过subq $88, %rsp保留 88 个字节subq $88, %rsp堆栈上的占用空间更大，因为除了通常的函数序言之外，还将寄存器r12-r15压入堆栈。
-O2 reserves only 56 bytes in addition to the registers pushed on the stack. -O2除了压入堆栈的寄存器外，仅保留 56 个字节。
Without optimization the footprint on the stack is the largest: everything needs to be stored/loaded to/from the stack between two lines of original code, in order to get predictable debug behavior so we can change values in debugger.如果没有优化，堆栈上的占用空间是最大的：所有内容都需要在两行原始代码之间存储/加载到/从堆栈中加载，以获得可预测的调试行为，以便我们可以更改调试器中的值。

That would explain your observations: without optimization the stack is full pretty quickly.这可以解释您的观察结果：如果没有优化，堆栈很快就会满了。 -O2 optimization mitigate it (but doesn't fix it), so recursion depth of 20000 can be handled - it will probably crash for 30000. -O3 optimization has a larger stack footprint and fails already for smaller inputs. -O2优化减轻了它（但没有修复它），因此可以处理 20000 的递归深度 - 它可能会崩溃 30000。 -O3优化具有更大的堆栈占用空间，并且对于较小的输入已经失败。

The proper fix for this problem is obvious now: one should either use the iterative version of depth first search or the breadth first search.这个问题的正确解决方案现在很明显：应该使用深度优先搜索或广度优先搜索的迭代版本。

Another issue in your code - the usage of substr which results in unnecessary memory copying/usage.代码中的另一个问题 - substr的使用导致不必要的内存复制/使用。 Just pass the iterators to the first character in the string and increment it for the recursion-call.只需将迭代器传递给字符串中的第一个字符，并为递归调用增加它。

g++ 优化：O2 标志修复了 O3 再次破坏的损坏代码

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-11-03 15:49:20

g++ 优化：O2 标志修复了 O3 再次破坏的损坏代码

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-11-03 15:49:20

解决方案1
2 已采纳 2017-11-03 15:49:20