简体   繁体   English

C# 正则表达式匹配括号组,除非它包含指定的单词

[英]C# Regex Match parenthesis group except if it contains specified word

I have a long string:我有一个很长的字符串:

(Today is a blue day) (Today is a good day) (Today is a BAD day) (Today is a green day) (Today is a blue day) (今天是蓝色的日子)(今天是个好日子)(今天是日子)(今天是绿色的日子)(今天是蓝色的日子)

I want to match the parentheses groups, except if it contains the capitalized word.我想匹配括号组,除非它包含大写的单词。 The word will always be fully capitalized, but may not be the only fully capitalized word - but will be the only word that is exactly BAD.该单词将始终完全大写,但可能不是唯一完全大写的单词 - 但将是唯一完全大写的单词。

I have a very long string and I want to change the parentheses groups that do not contain the word BAD while leaving BAD alone.我有一个很长的字符串,我想更改不包含单词 BAD 的括号组,同时保留 BAD。 I was hoping to avoid iterating over every single parentheses group to check if it contains BAD.我希望避免遍历每个括号组来检查它是否包含 BAD。

This: \\(.+?\\)这是: \\(.+?\\)

Will match my parentheses groups.将匹配我的括号组。

I have tried:我试过了:

\\(.+?(?=\\bBAD\\b).+?\\) - this matches every group up to the group containing BAD. \\(.+?(?=\\bBAD\\b).+?\\) - 这匹配每个组直到包含 BAD 的组。

(?=\\bBAD\\b).+?\\) - this matches the end of the group "BAD day)" (?=\\bBAD\\b).+?\\) - 这与“BAD day)”组的结尾相匹配

I tried a few variations of negative lookbehinds but could not get them to provide a result.我尝试了一些负面回顾的变体,但无法让它们提供结果。

I know this works:我知道这有效:

\(.[^BAD]+?\)

Until you include (Today is a Blue day) - and then it fails.直到你包括(今天是蓝色的一天) - 然后它失败了。

Anyone know an effective way to do this?任何人都知道一种有效的方法来做到这一点?

You can use您可以使用

\((?>([^()]*\bBAD\b)?)[^()]*\)(?(1)(?!))

See the .NET regex demo .请参阅.NET 正则表达式演示 Details :详情

  • \\( - a ( char \\( - a (字符
  • (?>([^()]*\\bBAD\\b)?) - an atomic group (that disallows re-trying its pattern when backtracking occurs): zero or more chars other than ) and ( and then a whole word BAD , all captured into Group 1 (?>([^()]*\\bBAD\\b)?) - 一个原子组(不允许在发生回溯时重新尝试其模式):除)(然后是一个完整的单词BAD之外的零个或多个字符,全部被捕获到第 1 组
  • [^()]* - zero or more chars other than ( and ) [^()]* - 除()之外的零个或多个字符
  • \\) - a ) char \\) - a )字符
  • (?(1)(?!)) - if Group 1 was matched, trigger backtracking (here, it will fail the match since we used an atomic group before). (?(1)(?!)) - 如果第 1 组匹配,则触发回溯(这里,由于我们之前使用了原子组,因此匹配失败)。

See the C# demo :请参阅C# 演示

var text = "(Today is a blue day) (Today is a good day) (Today is a BAD day) (Today is a green day) (Today is a blue day)";
var matches = Regex.Matches(text, @"\((?>([^()]*\bBAD\b)?)[^()]*\)(?(1)(?!))")
    .Cast<Match>()
    .Select(x => x.Value)
    .ToList();

Output:输出:

(Today is a blue day)
(Today is a good day)
(Today is a green day)
(Today is a blue day)

This part (?=\\bBAD\\b).+?\\) asserts BAD to the right and then matches as least as possible till the next ) .这部分(?=\\bBAD\\b).+?\\)在右侧断言 BAD ,然后在下一个)之前尽可能地匹配。 It can also be written without the lookahead \\bBAD\\b.+?\\)它也可以在没有前瞻的情况下编写\\bBAD\\b.+?\\)

This part [^BAD] matches any character except the characters B A D这部分[^BAD]匹配除字符B A D之外的任何字符

You can use the opposite using a negative lookahead instead to asser that BAD is not between parenthesis, and you might also add word boundaries \\b to prevent a partial match.您可以使用相反的方式使用否定前瞻来断言 BAD 不在括号之间,并且您还可以添加单词边界\\b以防止部分匹配。

\((?![^()]*\bBAD\b[^()]*\))[^()]*\)

The pattern matches:模式匹配:

  • \\( Match ( \\(匹配(
  • (?![^()]*\\bBAD\\b[^()]*\\)) Negative lookahead, assert not optional parenthesis followed by the word BAD till the first closing parenthesis to the right (?![^()]*\\bBAD\\b[^()]*\\))否定前瞻,断言不是可选括号后跟单词 BAD 直到右边的第一个右括号
  • [^()]* Match 0+ times any char except ( ) using a negated character class [^()]*使用否定字符类匹配 0+ 次除( )之外的任何字符
  • \\) Match ) \\)匹配)

Regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM