简体   繁体   中英

How to match an even number of any character in a string?

I have a string:

aaabbashasccddee

And I want to get matches of even number of consecutive same characters. For example, from the above string, I want these matches:

[bb],[cc],[dd],[ee]

I have tried this solution but it's not even close:

"^(..)*$

Any help please

Fortunately .NET regular expressions are capable of handling infinite lookbehinds. What you need could be achieved using the following regex:

((?>(?(2)(?=\2))(.)\2)+)(?<!\2\1)(?!\2)

See live demo here

Regex breakdown:

  • ( Start of capturing group #1
    • (?> Start of non-capturing group (atomic)
      • (?(2) If capturing group #2 is set
        • (?=\\2) Next character should be it
      • ) End f conditional
      • (.)\\2 Match and capture a character and match it again (even number)
    • )+ Repeat as much as possible, at least once
  • ) End of capturing group #1
  • (?<!\\2\\1) Here is the trick. The lookbehind tells engine that the immediate preceding character that comes earlier than what we matched so far shouldn't be the same character stored in capturing group #2
  • (?!\\2) Next character shouldn't be the same as the character stored in capturing group #2

UPDATE:

So you can do following code in C# to get all even sequences chars in string by Regex with no any other operators (pure Regex).

var allEvenSequences = Regex.Matches("aaabbashasccddee", @"((?>(?(2)(?=\2))(.)\2)+)(?<!\2\1)(?!\2)").Cast<Match>().ToList();

Also if you want to make [bb],[cc],[dd],[ee] then you can join that sequence array:

string strEvenSequences = string.Join(",", allEvenSequence.Select(x => $"[{x}]").ToArray());
//strEvenSequences will be [bb],[cc],[dd],[ee]

Another possible regex-only solution that doesn't involve conditionals:

(.)(?<!\1\1)\1(?:\1\1)*(?!\1)

Breakdown:

(.)         # First capturing group - matches any character.
(?<!\1\1)   # Negative lookbehind - ensures the matched char isn't preceded by the same char.
\1          # Match another one of the character in the 1st group (at least two in total).
(?:\1\1)    # A non-capturing group that matches two occurrences of the same char.
*           # Matches between zero and unlimited times of the previous group.
(?!\1)      # Negative lookahead to make sure no extra occurrence of the char follows.

Demo:

string input = "aaabbashasccddee";
string pattern = @"(.)(?<!\1\1)\1(?:\1\1)*(?!\1)";
var matches = Regex.Matches(input, pattern);
foreach (Match m in matches)
    Console.WriteLine(m.Value);

Output:

bb
cc
dd
ee

Try it online .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM