在 C 中使用正則表達式時，\\d 不起作用，但 [0-9] 起作用

Question

我不明白為什么包含\\d字符類的正則表達式模式不起作用，但[0-9]起作用。 字符類，例如\\s （空白字符）和\\w （單詞字符），確實有效。 我的編譯器是 gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3。 我正在使用 C 正則表達式庫。

為什么\\d不起作用？

文本字符串：

const char *text = "148  apples    5 oranges";

對於上述文本字符串，此正則表達式不匹配：

const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";

當使用 [0-9] 而不是 \\d 時，此正則表達式匹配：

const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";



#include <stdio.h>
#include <stdlib.h>
#include <regex.h>

#define N_MATCHES  30

//   output from gcc --version: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
//   compile command used:  gcc -o tstc_regex tstc_regex.c

const char *text = "148  apples    5 oranges";
  const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";    // finds match
//const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";        // does not find match

int main(int argc, char**argv)
{
    regex_t   rgx;
    regmatch_t   matches[N_MATCHES];
    int status;
    status = regcomp(&rgx, rstr, REG_EXTENDED | REG_NEWLINE);
    if (status != 0) {
        fprintf(stdout, "regcomp error: %d\n", status);
        return 1;
    }
    status = regexec(&rgx, text, N_MATCHES, matches, 0);
    if (status == REG_NOMATCH) {
        fprintf(stdout, "regexec result: REG_NOMATCH (%d)\n", status);
    }
    else if (status != 0) {
        fprintf(stdout, "regexec error: %d\n", status);
        return 1;
    }
    else {
        fprintf(stdout, "regexec match found: %d\n", status);
    }
    return 0;
}

Answer 1

您使用的正則表達式風格是 GNU ERE，它類似於 POSIX ERE，但具有一些額外的功能。 其中包括支持字符類速記\\s 、 \\S 、 \\w和\\W ，但不支持\\d和\\D 。 您可以在此處找到更多信息。

Answer 2

在嚴格的 POSIX 環境中嘗試任一模式可能最終都沒有匹配項； 如果要使模式真正與 POSIX 兼容，請使用所有括號表達式：

const char *rstr = "^[[:digit:]]+[[:space:]]+[[:alpha:]]+[[:space:]]+[[:digit:]]+[[:space:]]+[[:alpha:]]+$";

↳ POSIX 字符類

Answer 3

根據POSIX 正則表達式規范：

普通字符是受支持字符集中的任何字符，但 ERE 特殊字符中列出的 ERE 特殊字符除外。 以反斜杠 ( '\\' ) 開頭的普通字符的解釋未定義。

所以唯一可以合法跟在\\后面的字符是：

\^    \.    \[    \$    \(    \)    \|
\*    \+    \?    \{    \\

所有這些都與轉義字符字面匹配。 嘗試使用任何其他PCRE擴展可能不起作用。

Answer 4

\\d 是一個 perl 和 vim 字符類。

改用：

 const char *rstr = "^[[:digit:]]+\\s+\\w+\\s+[[:digit:]]+\\s+\\w+$";

在 C 中使用正則表達式時，\\d 不起作用，但 [0-9] 起作用

問題描述

4 個解決方案

解決方案1
6 已采納 2016-06-22 01:23:59

解決方案2
4 2016-06-22 01:22:54

解決方案3
1 2016-06-22 00:57:58

解決方案4
1 2016-06-22 01:05:37

在 C 中使用正則表達式時，\\d 不起作用，但 [0-9] 起作用

問題描述

4 個解決方案

解決方案1 6 已采納 2016-06-22 01:23:59

解決方案2 4 2016-06-22 01:22:54

解決方案3 1 2016-06-22 00:57:58

解決方案4 1 2016-06-22 01:05:37

解決方案1
6 已采納 2016-06-22 01:23:59

解決方案2
4 2016-06-22 01:22:54

解決方案3
1 2016-06-22 00:57:58

解決方案4
1 2016-06-22 01:05:37