简体   繁体   中英

C# Regex Match parenthesis group except if it contains specified word

I have a long string:

(Today is a blue day) (Today is a good day) (Today is a BAD day) (Today is a green day) (Today is a blue day)

I want to match the parentheses groups, except if it contains the capitalized word. The word will always be fully capitalized, but may not be the only fully capitalized word - but will be the only word that is exactly BAD.

I have a very long string and I want to change the parentheses groups that do not contain the word BAD while leaving BAD alone. I was hoping to avoid iterating over every single parentheses group to check if it contains BAD.

This: \\(.+?\\)

Will match my parentheses groups.

I have tried:

\\(.+?(?=\\bBAD\\b).+?\\) - this matches every group up to the group containing BAD.

(?=\\bBAD\\b).+?\\) - this matches the end of the group "BAD day)"

I tried a few variations of negative lookbehinds but could not get them to provide a result.

I know this works:

\(.[^BAD]+?\)

Until you include (Today is a Blue day) - and then it fails.

Anyone know an effective way to do this?

You can use

\((?>([^()]*\bBAD\b)?)[^()]*\)(?(1)(?!))

See the .NET regex demo . Details :

  • \\( - a ( char
  • (?>([^()]*\\bBAD\\b)?) - an atomic group (that disallows re-trying its pattern when backtracking occurs): zero or more chars other than ) and ( and then a whole word BAD , all captured into Group 1
  • [^()]* - zero or more chars other than ( and )
  • \\) - a ) char
  • (?(1)(?!)) - if Group 1 was matched, trigger backtracking (here, it will fail the match since we used an atomic group before).

See the C# demo :

var text = "(Today is a blue day) (Today is a good day) (Today is a BAD day) (Today is a green day) (Today is a blue day)";
var matches = Regex.Matches(text, @"\((?>([^()]*\bBAD\b)?)[^()]*\)(?(1)(?!))")
    .Cast<Match>()
    .Select(x => x.Value)
    .ToList();

Output:

(Today is a blue day)
(Today is a good day)
(Today is a green day)
(Today is a blue day)

This part (?=\\bBAD\\b).+?\\) asserts BAD to the right and then matches as least as possible till the next ) . It can also be written without the lookahead \\bBAD\\b.+?\\)

This part [^BAD] matches any character except the characters B A D

You can use the opposite using a negative lookahead instead to asser that BAD is not between parenthesis, and you might also add word boundaries \\b to prevent a partial match.

\((?![^()]*\bBAD\b[^()]*\))[^()]*\)

The pattern matches:

  • \\( Match (
  • (?![^()]*\\bBAD\\b[^()]*\\)) Negative lookahead, assert not optional parenthesis followed by the word BAD till the first closing parenthesis to the right
  • [^()]* Match 0+ times any char except ( ) using a negated character class
  • \\) Match )

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM