简体   繁体   English

g++ 优化:O2 标志修复了 O3 再次破坏的损坏代码

[英]g++ Optimization : O2 flag fixes a broken code where O3 breaks it again

This code, for matching a string in NFA, which I think requires O(N^2) memory, predictably breaks when string size is 20,000 , then works with -O2 compiled code, then breaks again for -O3 .这段代码用于匹配 NFA 中的字符串,我认为它需要O(N^2)内存,当字符串大小为20,000时可预见地中断,然后使用-O2编译代码,然后再次中断-O3 Compilation was done with -std=c++14 enabled.编译是在启用-std=c++14下完成的。 In my opinion, the problem is stack-overflow.在我看来,问题是堆栈溢出。

Input string was "ab" repeated 10,000 times, plus a 'c' at end.The image below contains the NFA I'm trying to match.输入字符串是"ab"重复10,000次,最后加上一个'c' 。下图包含我试图匹配的 NFA。

Specifically, my question is -具体来说,我的问题是-

1) What -O2 optimization is behind this,(which I believe is impressive) fix? 1) 这背后的-O2优化是什么,(我认为这是令人印象深刻的)修复?

2) And what -O3 optimization breaks it again? 2)什么-O3优化再次打破了它?

struct State
{
    map<char,vector<State*> > transitions;
    bool accepting = false;
};

bool match(State* state,string inp){
    if(inp=="") return state->accepting;

    for(auto s:state->transitions[inp[0]]) 
        if(match(s,inp.substr(1))) return true;

    for(auto s:state->transitions['|']) //e-transitions
        if(match(s,inp)) return true;

    return false;
}

In gcc documentation, it's said O3 has all optimizations of O2, plus some more.在 gcc 文档中,据说 O3 具有 O2 的所有优化,以及更多优化。 I couldn't "get" some of those extras or their relevance to this problem.And I want to emphasize, for what I've seen in similar questions, that I'm not looking for specific ways to fix this problem.我无法“获得”其中的一些额外内容或它们与此问题的相关性。而且我想强调,对于我在类似问题中看到的内容,我不是在寻找解决此问题的具体方法。

测试 NFA

As you already have figured out: the problem is the stack-usage of your recursion.正如您已经发现的那样:问题在于递归的堆栈使用。 It is also true that TLO would not be performed neither for -O2 nor for -O3 (theoretically it would be possible only for the last recur-call which would not help in your case).确实,对于-O2-O3都不会执行 TLO(理论上只有最后一次 recur-call 可能对您的情况没有帮助)。

However, depending on the level of the optimization your function needs different amount of space on the stack.但是,根据优化级别,您的函数在堆栈上需要不同的空间量。 There is no guarantee that -O3 version will be faster and need less space on the stack.不能保证-O3版本会更快并且需要更少的堆栈空间。

When we look at the assembly we can see the the following:当我们查看程序集时,我们可以看到以下内容:

  1. -O3 reserves 88 bytes via subq $88, %rsp , the footprint on the stack is even larger because also registers r12-r15 are pushed on the stack in addition to the usual function prologue. -O3通过subq $88, %rsp保留 88 个字节subq $88, %rsp堆栈上的占用空间更大,因为除了通常的函数序言之外,还将寄存器r12-r15压入堆栈。

  2. -O2 reserves only 56 bytes in addition to the registers pushed on the stack. -O2除了压入堆栈的寄存器外,仅保留 56 个字节。

  3. Without optimization the footprint on the stack is the largest: everything needs to be stored/loaded to/from the stack between two lines of original code, in order to get predictable debug behavior so we can change values in debugger.如果没有优化,堆栈上的占用空间是最大的:所有内容都需要在两行原始代码之间存储/加载到/从堆栈中加载,以获得可预测的调试行为,以便我们可以更改调试器中的值。

That would explain your observations: without optimization the stack is full pretty quickly.这可以解释您的观察结果:如果没有优化,堆栈很快就会满了。 -O2 optimization mitigate it (but doesn't fix it), so recursion depth of 20000 can be handled - it will probably crash for 30000. -O3 optimization has a larger stack footprint and fails already for smaller inputs. -O2优化减轻了它(但没有修复它),因此可以处理 20000 的递归深度 - 它可能会崩溃 30000。 -O3优化具有更大的堆栈占用空间,并且对于较小的输入已经失败。

The proper fix for this problem is obvious now: one should either use the iterative version of depth first search or the breadth first search.这个问题的正确解决方案现在很明显:应该使用深度优先搜索或广度优先搜索的迭代版本。

Another issue in your code - the usage of substr which results in unnecessary memory copying/usage.代码中的另一个问题 - substr的使用导致不必要的内存复制/使用。 Just pass the iterators to the first character in the string and increment it for the recursion-call.只需将迭代器传递给字符串中的第一个字符,并为递归调用增加它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM