简体   繁体   English

正则表达式中的可选组

[英]Optional group in regular expression

I have lines I'm trying to match that could be:我尝试匹配的线路可能是:

ACDNT: BLAHBLAH COUNTY, NC
ACDNT: BLAHBLAH COUNTY, NC PERS INJ
ACDNT: BLAHBLAH COUNTY, NC CMV
ACDNT: SOMEWHERE ELSE

So I want a regular expression that matches "ACDNT: ", a location with or without the NC county, and then either nothing, "PERS INJ", or "CMV".所以我想要一个匹配“ACDNT:”的正则表达式,一个有或没有 NC 县的位置,然后什么都没有,“PERS INJ”或“CMV”。 I want to capture the location and the 'extra' (PERS INJ or CMV) in groups.我想成组捕获位置和“额外”(PERS INJ 或 CMV)。

I'm trying:我想:

(ACDNT: +)(.*)( +(CMV|PERS INJ))?

with the test string:使用测试字符串:

ACDNT: SOMEWHERE PERS INJ

and regex101 (with the Java option) matches 'SOMEWHERE PERS INJ' as group 2. I was expecting "PERS INJ" to be in its own group.和 regex101(带有 Java 选项)将“SOMEWHERE PERS INJ”匹配为第 2 组。我期待“PERS INJ”在它自己的组中。

I thought the trailing question mark would make the group enclosing the space and the last text optional.我认为结尾的问号会使包含空格的组和最后一个文本成为可选的。 How would alter the regular expression to do that?如何改变正则表达式来做到这一点?

To summarize, I want to match the location (whether it's an NC county or not) as its own group, then have an optional group that has one of the two 'extra' strings if they're there.总而言之,我想将位置(无论是否是北卡罗来纳州县)匹配为自己的组,然后有一个可选组,如果它们存在,则该组具有两个“额外”字符串之一。

("a programmer had a problem and decided to solve it with regular expressions. Now he has two problems...") (“一个程序员遇到了一个问题,决定用正则表达式来解决,现在他有两个问题……”)

Try (ACDNT: +)(.*?)( +(CMV|PERS INJ))?$尝试(ACDNT: +)(.*?)( +(CMV|PERS INJ))?$

Your problem is that .* is greedy and consumes the entire rest of the string--that's why you're seeing "SOMEWHERE PERS INJ" all in the same group.您的问题是.*贪婪并消耗了整个 rest 字符串——这就是为什么您在同一组中看到“SOMEWHERE PERS INJ”的原因。 I changed * to *?我把*改成了*? to make it reluctant instead of greedy, and I added $ at the end to force the matcher to consider the whole string.为了让它不情愿而不是贪婪,我在末尾添加了$以强制匹配器考虑整个字符串。

There are still some caveats.还有一些注意事项。 Note that an input consisting of "ACDNT: " followed by any string will still be a successful match.请注意,由“ACDNT:”后跟任何字符串组成的输入仍然是成功匹配。 You could help address this by being more specific with what's allowed for the location instead of .* .您可以通过更具体地说明位置允许的内容而不是.*来帮助解决这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM