简体   繁体   English

在std :: regex_replace期间堆栈溢出

[英]Stack overflow during std::regex_replace

I'm trying to execute the following C++ STL-based code to replace text in a relatively large SQL script (~8MB): 我正在尝试执行以下基于C ++ STL的代码来替换相对较大的SQL脚本中的文本(~8MB):

std::basic_regex<TCHAR> reProc("^[ \t]*create[ \t]+(view|procedure|proc)+[ \t]+(.+)$\n((^(?![ \t]*go[ \t]*).*$\n)+)^[ \t]*go[ \t]*$");
std::basic_string<TCHAR> replace = _T("ALTER $1 $2\n$3\ngo");
return std::regex_replace(strInput, reProc, replace);

The result is a stack overflow, and it's hard to find information about that particular error on this particular site since that's also the name of the site. 结果是堆栈溢出,并且很难在此特定站点上找到有关该特定错误的信息,因为这也是站点的名称。

Edit: I am using Visual Studio 2013 Update 5 编辑:我正在使用Visual Studio 2013 Update 5

Edit 2: The original file is over 23,000 lines. 编辑2:原始文件超过23,000行。 I cut the file down to 3,500 lines and still get the error. 我将文件减少到3,500行仍然得到错误。 When I cut it by another ~50 lines down to 3,456 lines, the error goes away. 当我用另外约50行切割到3,456行时,错误就消失了。 If I put just those cut lines into the file, the error is still gone. 如果我只将那些切割线放入文件中,则错误仍然消失。 This suggests that the error is not related to specific text, but just too much of it. 这表明错误与特定文本无关,而只是过多。

Edit 3: A full working example is demonstrated operating properly here: https://regex101.com/r/iD1zY6/1 It doesn't work in that STL code, though. 编辑3:这里演示了一个完整的工作示例: https//regex101.com/r/iD1zY6/1但它在该STL代码中不起作用。

The following trimmed-down version of your regex saves about 20% of processing steps according to regex101 (see here ). 根据regex101,正则表达式的以下修剪版本可以节省大约20%的处理步骤(请参阅此处 )。

\\bcreate[ \t]+(view|procedure|proc)[ \t]+(.+)\n(((?![ \t]*go[ \t]*).*\n)+)[ \t]*go[ \t]*

Modifications: 修改:

  • inline anchors removed: you are expressly testing for newline characters 内联锚已删除:您正在明确测试换行符
  • repetition operator for the db object keywords removed - a repetition at this point would make the original script syntactically invalid. 删除了db对象关键字的重复运算符 - 此时重复会使原始脚本在语法上无效。
  • initial whitespace pattern replaced by word boundary (note the double backslash - the escape sequence is for the regex engine, not for the compiler) 初始空格模式由单词边界替换(注意双反斜杠 - 转义序列用于正则表达式引擎,而不是编译器)

If you can be sure that ... 如果你能确定......

  • the create ... statements do not occur in string literals, and create ...语句不会出现在字符串文字中,并且

  • you do not need to distinguish between create ... statements followed by a go or not (eg. because all statements are trailed by a go ) 你并不需要区分create ...发言,然后是一个go或没有(例如,因为所有的语句由一个落后go

...it might even be easier to just replace these strings: ......更换这些字符串可能更容易:

std::basic_regex<TCHAR> reProc("\bcreate[ \t]+(view|procedure|proc)");
std::basic_string<TCHAR> replace = _T("ALTER $1");
return std::regex_replace(strInput, reProc, replace);

( Here is a demo for the latter approach - reduces the steps to a little more than 1/4 th). 是后一种方法的演示 - 将步骤减少到1/4以上)。

It turns out that STL regular expressions are tragic under-performers versus Perl (about 100 times slower if you can believe https://stackoverflow.com/a/37016671/78162 ), so it's apparently necessary to absolutely minimize the use of regular expressions in STL/C++ when performance is a serious concern. 事实证明,STL正则表达式与Perl相比是悲剧性的表现不佳(如果你能相信https://stackoverflow.com/a/37016671/78162 ,那么速度会快100倍),因此显然有必要绝对最小化正则表达式的使用在STL / C ++中,性能是一个严重的问题。 (The degree to which C++/STL under-performs here blew my mind considering I presume C++ to generally be one of the more performant languages). (考虑到我认为C ++通常是性能更高的语言之一),C ++ / STL在这里表现不佳的程度让我大吃一惊。 I ended up passing the file stream to read one line at a time and only run the expression on lines that needed processing like this: 我最终传递文件流一次读取一行,只在需要处理的行上运行表达式:

   std::basic_string<TCHAR> result;
   std::basic_string<TCHAR> line;
   std::basic_regex<TCHAR> reProc(_T("^[ \t]*create[ \t]+(view|procedure|proc)+[ \t]+(.+)$"), std::regex::optimize);
   std::basic_string<TCHAR> replace = _T("ALTER $1 $2");

   do {
      std::getline(input, line);
      int pos = line.find_first_not_of(_T(" \t"));
      if ((pos != std::basic_string<TCHAR>::npos) 
          && (_tcsnicmp(line.substr(pos, 6).data(), _T("create"), 6)==0))
         result.append(std::regex_replace(line, reProc, replace));
      else
         result.append(line);
      result.append(_T("\n"));
   } while (!input.eof());
   return result;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM