简体   繁体   English

正则表达式括号表达式中的反斜杠

[英]Backslash in regular expression bracket expression

Given the regular expression "[\\^]" should it match the strings "\\" and "^"? 给定正则表达式“[\\ ^]”它应该匹配字符串“\\”和“^”吗?

My reading of the relevant C++, POSIX, and ECMAScript standards is that for the POSIX (basic, extended, awk, gre, and egrep) syntaxes, the regex should match both strings, and for ECMAScript syntax only the second string should be matched. 我对相关C ++,POSIX和ECMAScript标准的阅读是针对POSIX(基本,扩展,awk,gre和egrep)语法,正则表达式应匹配两个字符串,而对于ECMAScript语法,只应匹配第二个字符串。

The POSIX references for EREs and the awk, grep, and egrep utilities all defer to the BRE specification ( XBD 9.3.5/1 ) which says explicitly "The special characters '.', '*', '[', and '\\' (period, asterisk, left-bracket, and backslash, respectively) shall lose their special meaning within a bracket expression." 对于ERE以及awk,grep和egrep实用程序的POSIX引用都遵循BRE规范( XBD 9.3.5 / 1 ),明确说明“特殊字符”。','*','['和'\\ '(句号,星号,左括号和反斜杠)将在括号表达式中失去其特殊含义。“ so I interpret that to mean that a backslash is just a backslash once inside a bracket expression. 所以我认为这意味着一个反斜杠只是一个括号表达式内的反斜杠。

The ECMAScript specification does not have the 'lose its special meaning' rule but instead specifies that a backslash followed by a non-alphanumeric character is just the character itself. ECMAScript规范没有“失去其特殊含义”规则,而是指定反斜杠后跟非字母数字字符只是字符本身。

The GCC standard library (libstdc++) matches neither string, regardless of the regex syntax chosen. 无论选择何种正则表达式语法,GCC标准库(libstdc ++)都不匹配字符串。 The LLVM standard library (libc++) matches the way I expect with the ECMAScript syntax but raises an exception when constructing the regex with any other syntax ("invalid escaped character"). LLVM标准库(libc ++)与我期望的ECMAScript语法相匹配,但在使用任何其他语法(“无效转义字符”)构造正则表达式时引发异常。

Here's some code. 这是一些代码。

#include <iostream>
#include <regex>
#include <string>

void
do_match(std::string const& label, std::regex_constants::syntax_option_type type)
{
    try {
        std::regex re("[\\^]*", type);
        std::cmatch m;
        if (std::regex_match("\\^", m, re)) {
            for (auto res: m) {
                std::cerr << label << " match: " << res << "\n";
            }
        } else {
            std::cerr << label << " no match\n";
        }
    } catch (std::regex_error const& ex) {
        std::cerr << "caught exception: " << ex.what() << "\n";
    }
}

int
main()
{
    do_match("awk", std::regex_constants::awk);
    do_match("ecma", std::regex_constants::ECMAScript);
}

Are my expectations wrong, and if not, which standard library implementation is correct? 我的期望是错误的,如果没有,哪个标准库实现是正确的?

Given the regular expression "[\\^]" should it match the strings "\\" and "^"? 给定正则表达式“[\\ ^]”它应该匹配字符串“\\”和“^”吗?

using std::regex_constants

  1. ECMAScript , awk - No, it will not match. ECMAScriptawk - 不,它不会匹配。 The \\^ is escaping ^ , so the [\\^] is interpreted as [^] (The "removal of escapes characters" (ie. substituting \\^ for ^ ) comes before "parsing [ set). The ^ character is the first character after [ bracket, so it is interpreted as "negation" (I call it like that), so the bracket will match anything except for the list. As the list is empty [^<this list here>] , it will anything except an empty list... Well, it will match nothing. \\^正在转义^ ,所以[\\^]被解释为[^] (“删除转义字符”(即替换\\^^ )在“解析[ set]之前”。 ^字符是第一个在[括号之后的字符,所以它被解释为“否定”(我称之为),所以括号将匹配除列表之外的任何内容。由于列表为空[^<this list here>] ,它将除外一个空列表......好吧,它什么都不匹配。

  2. basic , grep , extended , egrep - it will match both strings. basicgrepextendedegrep - 它将匹配两个字符串。 The \\ loose escaping meaning inside the [ . \\松散的逃避意义] [ So [\\^] will literally match \\ or ^ . 所以[\\^]将字面上匹配\\^

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM