简体   繁体   English

如何为命名捕获组创建条件正则表达式?

[英]How can I create a conditional regex for a named capturing group?

We are looking to dump our PMDF logs into Splunk and I am trying to parse the PMDF SMTP logs, specifically the message, and I'm hitting an issue where a named capturing group (dst_channel) may or may not have a value.我们希望将 PMDF 日志转储到 Splunk 中,我正在尝试解析 PMDF SMTP 日志,特别是消息,我遇到了一个问题,即命名捕获组 (dst_channel) 可能有也可能没有值。 Here is my regex so far:到目前为止,这是我的正则表达式:

\d{2}\-\w{3}\-\d{4}\s\d{2}\:\d{2}\:\d{2}\.\d{2}\s(?P<src_channel>\w+)\s+(?P<dst_channel>\w+)\s(?P<code>\w+)\s(?P<bytes>\d+)\s(?P<from>\w.+)\srfc822

I'm able to match the following message, in which tcp_msx_out_2 is the dst_channel我能够匹配以下消息,其中 tcp_msx_out_2 是 dst_channel

02-Feb-2017 08:00:19.60 tcp_exempt   tcp_msx_out_2 E 2 mailman-bounces@list.xyz.com rfc822;user@xyz.com user@xyz.com <mailman.157.1486040414.29131.xxx@xxx.xyz.com> pmdf list.xyz.com ([x.x.x.x])

however, I'm not matching the following logs that doesn't contain a dst_channel value:但是,我不匹配以下不包含 dst_channel 值的日志:

02-Feb-2017 09:00:01.59 tcp_imap_int              Q 12 xxx@xyz.com rfc822;user@imap-internal.xyz.com user@imap.xyz.com <6940401380880269855036@PT-D69> pmdf  user@imap.xyz.com: smtp;452 4.2.2 Over quota

The next named capturing group I have is code E in the first message example, and Q in the second), and when the dst_channel is not there, the regex is not capturing all of the codes.我拥有的下一个命名捕获组是第一个消息示例中的代码 E,第二个示例中的代码 Q),并且当 dst_channel 不存在时,正则表达式不会捕获所有代码。

How can I modify my regex for conditional statements so that if the dst_channel is there, it grabs the value, but if not, regex continues on and is able to consistently grab the values for the other named capturing groups I have?如何修改我的条件语句的正则表达式,以便如果 dst_channel 存在,它会获取值,但如果没有,正则表达式会继续并能够持续获取我拥有的其他命名捕获组的值?

It worked if i changed the \\w+ to a \\w*如果我将\\w+更改为\\w*它会起作用

\d{2}\-\w{3}\-\d{4}\s\d{2}\:\d{2}\:\d{2}\.\d{2}\s(?P<src_channel>\w+)\s+(?P<dst_channel>\w*)\s(?P<code>\w+)\s(?P<bytes>\d+)\s(?P<from>\w.+)\srfc822

You can test it here你可以在这里测试

I suggest you use我建议你使用

\d{2}-\w{3}-\d{4}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+(?P<src_channel>\w+)(?:\s+(?P<dst_channel>\w+))?\s+(?P<code>\w+)\s+(?P<bytes>\d+)\s+(?P<from>\S+)\s+rfc822
                                                                   ^^^                       ^^  

See the regex demo .请参阅正则表达式演示

Basically, replace all \\s with \\s+ and make the dst channel group optional by wrapping both the \\s+ and the whole dst channel group with an optional non-capturing group.基本上,用\\s+替换所有\\s并通过用可选的非捕获组包装\\s+和整个 dst 通道组来使 dst 通道组可选。

Also, the from group pattern should be replaced with \\S+ (one or more chars other than whitespace) because you want to match an email, and .+ may - and usually it does - overmatch.此外, from组模式应替换为\\S+ (除空格之外的一个或多个字符),因为您要匹配电子邮件,而.+可能 - 通常确实 - 过度匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM