简体   繁体   English

为什么连续初始化std :: regex对象会使程序变慢?

[英]Why does a continuous initialization of a std::regex object slow down the program?

I have the following code snippet that reads lines from std::cin and prints them to std::cout . 我有以下代码片段,可从std::cin读取行并将其打印到std::cout

#include <iostream>
#include <string>
#include <regex>
int main() {

  //std::regex e2("([^[:blank:]]+)|(\"[^\"]+\")|(\\([^\\)]+\\))");
  const size_t BUFSIZE = (1<<10);
  std::string buffer;
  buffer.reserve( BUFSIZE );

  while (std::getline( std::cin, buffer )) {
    std::cout << buffer << std::endl;
    //std::regex e1("([^[:blank:]]+)|(\"[^\"]+\")|(\\([^\\)]+\\))");
  }
  return 0;
}

The execution time is quite fast for an input of 9,800 lines: 输入9,800行的执行时间非常快:

real    0m0.116s
user    0m0.056s
sys     0m0.024s

However, if I uncomment the std::regex e1 object in the while loop, the execution time is slowed down considerably: 但是,如果我在while循环中取消注释std::regex e1对象,则执行时间会大大降低:

real    0m2.859s
user    0m2.800s
sys     0m0.032s

On the other hand, uncommenting the std::regex e2 object, outside the loop, the execution time is not affected at all. 另一方面,在循环外部取消注释std::regex e2对象,完全不会影响执行时间。 Why is this happening, considering that I am not applying any regex matches, but I'm only constructing an object? 考虑到我没有应用任何正则表达式匹配项,而是仅构造一个对象,为什么会发生这种情况?

NB: I've seen this thread but didn't shed any light. NB:我看过这个话题,但是没有发现任何线索。

In order for matching to be fast, the pattern must be processed into a form that allows fast matching, which takes a lot of time. 为了快速匹配,必须将模式处理为允许快速匹配的形式,这需要很多时间。 This is normally done during construction of the regex object; 这通常是在构造正则表达式对象时完成的。 in fact, that's the entire point of constructing the regex object! 实际上,这就是构造正则表达式对象的全部要点! If there was no extra work done during construction, then there would be no point in having a separate regex object at all -- the match function would just take in the pattern as a raw string and use it then. 如果在构造过程中没有进行任何额外的工作,那么根本没有单独的regex对象是完全没有意义的-match函数只会将模式作为原始字符串使用,然后使用它。

Regex matching is mostly implemented as a finite state machine . 正则表达式匹配主要实现为有限状态机 The implementation needs to build this state machine. 实现需要构建此状态机。 The state machine is dependent on the regular expression you provide. 状态机取决于您提供的正则表达式。 Some regular expressions will have typically very complex finite state machines. 一些正则表达式通常具有非常复杂的有限状态机。 The complexity will be a factor of number of branches possible in the regex. 复杂度将成为正则表达式中可能分支的数量的因素。 The more complex state machine, more work required to set up the regex object before it can start matching input strings. 状态机越复杂,设置regex对象就需要更多的工作才能开始匹配输入字符串。

As @Mehrdad correctly pointed out the sole reason why the regex interface exists instead of being a helper function is to segregate the heavy operation of setting up the state machine and then each search operation will comparatively be light weight. 正如@Mehrdad正确指出的,之所以存在regex接口而不是其辅助功能的唯一原因是将繁琐的设置状态机的操作隔离开来,然后每个搜索操作的重量就相对较小。

Here is the proposal for std::regex that talks about these design NIT's in detail 是std :: regex的建议,详细讨论了这些设计NIT。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM