C语言正则表达式匹配字符串的多个部分

Question

I have a c program where I can't get the regular expression matching to work the way I want. 我有一个c程序，在其中我无法使正则表达式匹配按我想要的方式工作。 Basically, I want to match the 1st char (W or M) in testStr and the name of the log file as the second match (TESTY.LOG). 基本上，我想将testStr中的第一个字符（W或M）与第二个匹配项（TESTY.LOG）的日志文件名进行匹配。 Here is what I have so far: 这是我到目前为止的内容：

#include    <stdio.h>
#include    <stdlib.h>
#include    <regex.h>
#define     MAX_MATCHES 2
.....
char testStr[20]="W TESTY.LOG ";
char temp[100];
int reti;
regex_t regex;
regmatch_t matches[MAX_MATCHES];
int i;
int numchars;

/* Compile regular expression */
reti = regcomp(&regex, "^([W|M])[[:space:]]([A-Z|0-9|\.]{1,})[[:space:]]*$", REG_EXTENDED);
/* Execute regular expression */
reti = regexec(&regex, testStr, MAX_MATCHES, matches, 0);
if (!reti) {
  for (i=0; i < MAX_MATCHES; i++) {
    numchars = (int)matches[i].rm_eo - (int)matches[i].rm_so;
    strncpy(temp,testStr+matches[i].rm_so,numchars);
    temp[numchars] = '\0';
  }
}

When I run this in gdb, I see the following for matches: 当我在gdb中运行它时，我看到以下匹配项：

(gdb) display matches 1: matches = {{rm_so = 0, rm_eo = 15}, {rm_so = 0, rm_eo = 1}}

2: temp = "W TESTY.LOG"

and 和

2: temp = "W"

So, I'm getting the first char OK but I am not getting just the log file name for the second match. 因此，我得到了第一个字符，但是我得到的不仅仅是第二个匹配项的日志文件名。 I use regex in perl but I'm new to regex in ansi c. 我在perl中使用了regex，但是在ANSI C中是regex的新手。 I feel like I'm missing something basic here. 我觉得这里缺少基本的东西。

Answer 1

Match 0 is the part of the string matched by the entire regex (Perl's $& ). 匹配0是整个正则表达式（Perl的$& ）匹配的字符串的一部分。 Match i for i > 0 is the part of the match corresponding to capture number i , the same as Perl's $1, $2, … . i > 0的匹配i是匹配捕获部分i的匹配部分，与Perl的$1, $2, … 。 You have two captures, so you should expect three matches. 您有两次捕获，因此您应该期待三场比赛。 But you specify MAX_MATCH as 2, so the last match is discarded. 但是您将MAX_MATCH指定为2，因此最后一个匹配项被丢弃。

Also, the regular expression 另外，正则表达式

^([W|M])[[:space:]]([A-Z|0-9|\.]{1,})[[:space:]]*$

is a little odd. 有点奇怪。 I think you should reread the documentation about character classes in regular expressions -- in this case, it is the same in Perl as it is in Posix extended REs. 我认为您应该重新阅读有关正则表达式中字符类的文档-在这种情况下，Perl中的内容与Posix扩展RE中的相同。 [W|M] matches any of the three characters W , | [W|M]匹配三个字符W ， |中的任何一个。 or M . 或M。 Similarly, [AZ|0-9|\\.]{1,} matches one or more of a letter, a digit, the character | 同样， [AZ|0-9|\\.]{1,}匹配字母，数字，字符|中的一个或多个。 or the character . 或角色。 . 。

The backslash is irrelevant since it only escapes the . 反斜杠是无关紧要的，因为它只逃脱了。 in the string literal, where escaping is unnecessary. 在字符串文字中，不需要转义。 If you had compiled with warnings enabled, -Wall , your C compiler would probably have warned you that the escape sequence is not legal. 如果您在启用警告的情况下-Wall进行了编译，则您的C编译器可能会警告您转义序列不合法。 If you had actually passed the backslash on to the regex library, it would have interpreted it as another possible match for the character class. 如果您实际上已经将反斜杠传递给了regex库，则它将把它解释为字符类的另一个可能的匹配项。

Also, {1,} can be conveniently written as + , both in Perl and in Posix Extended REs. 同样，在Perl和Posix Extended RE中， {1,}都可以方便地写为+ 。

In short, what you probably wanted was: 简而言之，您可能想要的是：

reti = regcomp(&regex, "^([WM])[[:space:]]([A-Z0-9.]+)[[:space:]]*$", REG_EXTENDED)

You could also use 您也可以使用

reti = regcomp(&regex, "^([WM])[[:space:]]([[:alnum:].]+)[[:space:]]*$", REG_EXTENDED)

C语言正则表达式匹配字符串的多个部分

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-09-19 14:42:37

C语言正则表达式匹配字符串的多个部分

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-09-19 14:42:37

解决方案1
2 已采纳 2018-09-19 14:42:37