简体   繁体   English

gcc 4.8 或更早版本是否有关于正则表达式的错误?

[英]Is gcc 4.8 or earlier buggy about regular expressions?

I am trying to use std::regex in a C++11 piece of code, but it appears that the support is a bit buggy.我试图在一段 C++11 代码中使用 std::regex,但似乎支持有点问题。 An example:一个例子:

#include <regex>
#include <iostream>

int main (int argc, const char * argv[]) {
    std::regex r("st|mt|tr");
    std::cerr << "st|mt|tr" << " matches st? " << std::regex_match("st", r) << std::endl;
    std::cerr << "st|mt|tr" << " matches mt? " << std::regex_match("mt", r) << std::endl;
    std::cerr << "st|mt|tr" << " matches tr? " << std::regex_match("tr", r) << std::endl;
}

outputs:输出:

st|mt|tr matches st? 1
st|mt|tr matches mt? 1
st|mt|tr matches tr? 0

when compiled with gcc (MacPorts gcc47 4.7.1_2) 4.7.1, either with当使用 gcc (MacPorts gcc47 4.7.1_2) 4.7.1 编译时,要么使用

g++ *.cc -o test -std=c++11
g++ *.cc -o test -std=c++0x

or或者

g++ *.cc -o test -std=gnu++0x

Besides, the regex works well if I only have two alternative patterns, eg st|mt , so it looks like the last one is not matched for some reasons.此外,如果我只有两种替代模式,例如st|mt ,则正则表达式效果很好,因此由于某些原因,看起来最后一个不匹配。 The code works well with the Apple LLVM compiler.该代码适用于 Apple LLVM 编译器。

Any ideas about how to solve the issue?关于如何解决问题的任何想法?

Update one possible solution is to use groups to implement multiple alternatives, eg (st|mt)|tr .更新一种可能的解决方案是使用组来实现多个替代方案,例如(st|mt)|tr

<regex> was implemented and released in GCC 4.9.0. <regex>在 GCC 4.9.0 中实现和发布。

In your (older) version of GCC, it is not implemented .在您(旧)版本的 GCC 中,它没有实现

That prototype <regex> code was added when all of GCC's C++0x support was highly experimental, tracking early C++0x drafts and being made available for people to experiment with.当 GCC 的所有 C++0x 支持都处于高度实验性、跟踪早期 C++0x 草案并可供人们进行实验时,添加了该原型<regex>代码。 That allowed people to find problems and give feedback to the standard committee before the standard was finalised.这允许人们在标准最终确定之前发现问题并向标准委员会提供反馈。 At the time lots of people were grateful to have had access to bleeding edge features long before C++11 was finished and before many other compilers provided any support, and that feedback really helped improve C++11.当时,很多人都庆幸在 C++11 完成之前以及许多其他编译器提供任何支持之前就可以使用最前沿的特性,而这种反馈确实有助于改进 C++11。 This was a Good Thing TM .这是一件好事TM

The <regex> code was never in a useful state, but was added as a work-in-progress like many other bits of code at the time. <regex>代码从未处于有用状态,而是像当时的许多其他代码一样作为正在进行的工作添加。 It was checked in and made available for others to collaborate on if they wanted to, with the intention that it would be finished eventually.它被签入并提供给其他人,如果他们愿意,可以进行协作,目的是最终完成。

That's often how open source works: Release early, release often -- unfortunately in the case of <regex> we only got the early part right and not the often part that would have finished the implementation.这通常是开源的工作方式:早发布,经常发布——不幸的是,在<regex>的情况下,我们只得到了早期的部分,而不是完成实施的经常部分。

Most parts of the library were more complete and are now almost fully implemented, but <regex> hadn't been, so it stayed in the same unfinished state since it was added.库的大部分内容更加完整,现在几乎完全实现,但<regex>没有实现,所以它自添加以来一直处于未完成状态。

Seriously though, who though that shipping an implementation of regex_search that only does "return false" was a good idea?不过说真的,谁认为发布一个只执行“返回假”的 regex_search 实现是个好主意?

It wasn't such a bad idea a few years ago, when C++0x was still a work in progress and we shipped lots of partial implementations.几年前,这并不是一个坏主意,当时 C++0x 仍在开发中,我们发布了许多部分实现。 No-one thought it would remain unusable for so long so, with hindsight, maybe it should have been disabled and required a macro or built-time option to enable it.没有人认为它会长时间无法使用,所以事后看来,也许它应该被禁用并需要一个宏或内置时间选项来启用它。 But that ship sailed long ago.但那艘船很久以前就航行了。 There are exported symbols from the libstdc++.so library that depend on the regex code, so simply removing it (in, say, GCC 4.8) would not have been trivial. libstdc++.so库中的导出符号依赖于正则表达式代码,因此简单地将其删除(例如,在 GCC 4.8 中)并非易事。

Feature Detection特征检测

This is a snippet to detect if the libstdc++ implementation is implemented with C preprocessor defines:这是一个片段,用于检测libstdc++实现是否使用 C 预处理器定义:

#include <regex>
#if __cplusplus >= 201103L &&                             \
    (!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
        (defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
         defined(_GLIBCXX_REGEX_STATE_LIMIT)           || \
             (defined(_GLIBCXX_RELEASE)                && \
             _GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif

Macros

  • _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT is defined in bits/regex.tcc in 4.9.x _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT定义bits/regex.tcc4.9.x
  • _GLIBCXX_REGEX_STATE_LIMIT is defined in bits/regex_automatron.h in 5+ _GLIBCXX_REGEX_STATE_LIMIT5+中的bits/regex_automatron.h定义
  • _GLIBCXX_RELEASE was added to 7+ as a result of this answer and is the GCC major version由于此答案_GLIBCXX_RELEASE已添加到7+并且是 GCC 主要版本

Testing测试

You can test it with GCC like this:你可以像这样用 GCC 测试它:

cat << EOF | g++ --std=c++11 -x c++ - && ./a.out
#include <regex>

#if __cplusplus >= 201103L &&                             \
    (!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
        (defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
         defined(_GLIBCXX_REGEX_STATE_LIMIT)           || \
             (defined(_GLIBCXX_RELEASE)                && \
             _GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif

#include <iostream>

int main() {
  const std::regex regex(".*");
  const std::string string = "This should match!";
  const auto result = std::regex_search(string, regex);
#if HAVE_WORKING_REGEX
  std::cerr << "<regex> works, look: " << std::boolalpha << result << std::endl;
#else
  std::cerr << "<regex> doesn't work, look: " << std::boolalpha << result << std::endl;
#endif
  return result ? EXIT_SUCCESS : EXIT_FAILURE;
}
EOF

Results结果

Here are some results for various compilers:以下是各种编译器的一些结果:


$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> doesn't work, look: false

$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ clang --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ ./a.out  # compiled with 'clang -lstdc++'
<regex> works, look: true

Here be Dragons这里是龙

This is totally unsupported and relies on the detection of private macros that the GCC developers have put into the bits/regex* headers.这是完全不受支持的,并且依赖于检测 GCC 开发人员放入bits/regex*标头中的私有宏。 They could change and go away at anytime .他们可以改变和消失随时 Hopefully, they won't be removed in the current 4.9.x, 5.x, 6.x releases but they could go away in the 7.x releases.希望它们不会在当前的 4.9.x、5.x、6.x 版本中被删除,但它们可能会在 7.x 版本中消失。

If the GCC developers added a #define _GLIBCXX_HAVE_WORKING_REGEX 1 (or something, hint hint nudge nudge) in the 7.x release that persisted, this snippet could be updated to include that and later GCC releases would work with the snippet above.如果 GCC 开发人员在持续存在的 7.x 版本中添加了#define _GLIBCXX_HAVE_WORKING_REGEX 1 (或其他东西,提示提示轻推轻推),则可以更新此代码段以包含该代码段,并且以后的 GCC 版本将与上述代码段一起使用。

As far as I know, all other compilers have a working <regex> when __cplusplus >= 201103L but YMMV.据我所知,当__cplusplus >= 201103L但 YMMV 时,所有其他编译器都有一个有效的<regex>

Obviously this would completely break if someone defined the _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT or _GLIBCXX_REGEX_STATE_LIMIT macros outside of the stdc++-v3 headers.显然,如果有人在stdc++-v3标头之外定义了_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT_GLIBCXX_REGEX_STATE_LIMIT宏,这将完全中断。

At this moment (using std=c++14 in g++ (GCC) 4.9.2) is still not accepting regex_match.此时(在 g++ (GCC) 4.9.2 中使用 std=c++14)仍然不接受 regex_match。

Here is an approach that works like regex_match but using sregex_token_iterator instead.这是一种类似于 regex_match 但使用 sregex_token_iterator 的方法。 And it works with g++.它适用于 g++。

string line="1a2b3c";
std::regex re("(\\d)");
std::vector<std::string> inVector{
    std::sregex_token_iterator(line.begin(), line.end(), re, 1), {}
};

//prints all matches
for(int i=0; i<inVector.size(); ++i)
    std::cout << i << ":" << inVector[i] << endl;

it will print 1 2 3它将打印 1 2 3

you may read the sregex_token_iterator reference in: http://en.cppreference.com/w/cpp/regex/regex_token_iterator您可以在以下位置阅读 sregex_token_iterator 参考: http ://en.cppreference.com/w/cpp/regex/regex_token_iterator

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM