简体   繁体   English

为什么此正则表达式不匹配?

[英]Why does this regexp not match?

my $genlog_line_1= qr{
   \A
   (?:(\d{6}\s+\d{1,2}:\d\d:\d\d|\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+(?:Z|-?\d\d:\d\d)?))? # Timestamp
   \s+
   (?:\s*(\d+))                     # Thread ID
   \s
   (\w+)                            # Command
   \s+
   (.*)                             # Argument
   \Z
}xs;

my $line = "2018-12-14T17:32:52.236100+08:00        477637459 Query SELECT dv.mandatory,dv.optional FROM dbversion dv";

my ($ts, $thread_id, $cmd, $arg) = $line =~ m/$genlog_line_1/;

print $ts, $thread_id, $cmd, $arg;

Why does the regexp not match? 为什么正则表达式不匹配? What I expect is: 我期望的是:

Timestamp 2018-12-14T17:32:52.236100
thread_id 477637459 
cmd Query 
arg  SELECT dv.mandatory,dv.optional FROM dbversion dv

You have +08:00 in your input, but -? 您输入的是+08:00 ,但是-? in (?:Z|-?\\d\\d:\\d\\d)? (?:Z|-?\\d\\d:\\d\\d)? only accounts for a negative value or values with no sign. 仅占一个负值或无符号的值。

Thus, on the first regex line, you should replace -? 因此,在第一条正则表达式行上,您应该替换-? with [+-]? [+-]? to match an optional - or + . 匹配可选的- + Also, since +08:00 part should not be part of Group 1, I suggest using a branch reset group , (?|...|...) , to capture different parts inside the group into the same group, Group 1: 另外,由于+08:00部分不应该属于组1,因此我建议使用分支重置组 (?|...|...)将组内的不同部分捕获到同一组(组1)中:

(?|(\d{6}\s+\d{1,2}:\d\d:\d\d)|(\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+)(?:Z|[-+]?\d\d:\d\d)?)?
 ^^^                         ^ ^                                         ^     ^^^^         

Fixed pattern: 固定模式:

my $genlog_line_1= qr{
   \A
   (?|(\d{6}\s+\d{1,2}:\d\d:\d\d)|(\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+)(?:Z|[-+]?\d\d:\d\d)?)? # Timestamp
   \s+
   (?:\s*(\d+))                     # Thread ID
   \s
   (\w+)                            # Command
   \s+
   (.*)                             # Argument
   \Z
}xs;

See the regex demo . 参见regex演示

Note the ? 注意? after the brach reset group might not be necessary if the TIMESTAMP is always present in the input. 如果输入中始终存在TIMESTAMP,则可能不需要分支复位组之后的命令。

The main problem of your regex is that it does not take into account the +08:00 present in your $line . 正则表达式的主要问题是它没有考虑$line+08:00

Change it in into: 将其更改为:

\A(?:(\d{6}\s+\d{1,2}:\d\d:\d\d|\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+(?:Z|-?\d\d:\d\d)?))?(?:\+\d\d:\d\d)?\s+(?:\s*(\d+))\s+(\w+)\s+(.*)\Z

demo: 演示:

https://regex101.com/r/fgRCv1/3 https://regex101.com/r/fgRCv1/3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM