简体   繁体   English

如何匹配字符串中偶数个字符?

[英]How to match an even number of any character in a string?

I have a string: 我有一个字符串:

aaabbashasccddee

And I want to get matches of even number of consecutive same characters. 而且我希望获得偶数个连续相同角色的匹配。 For example, from the above string, I want these matches: 例如,从上面的字符串,我想要这些匹配:

[bb],[cc],[dd],[ee]

I have tried this solution but it's not even close: 我试过这个解决方案,但它甚至没有关闭:

"^(..)*$

Any help please 请帮忙

Fortunately .NET regular expressions are capable of handling infinite lookbehinds. 幸运的是,.NET正则表达式能够处理无限的外观。 What you need could be achieved using the following regex: 您需要的是使用以下正则表达式:

((?>(?(2)(?=\2))(.)\2)+)(?<!\2\1)(?!\2)

See live demo here 在这里查看现场演示

Regex breakdown: 正则表达式细分:

  • ( Start of capturing group #1 (开始捕获组#1
    • (?> Start of non-capturing group (atomic) (?>非捕获组的开始(原子)
      • (?(2) If capturing group #2 is set (?(2)如果设置了捕获组#2
        • (?=\\2) Next character should be it (?=\\2)下一个字符应该是它
      • ) End f conditional )结束有条件的
      • (.)\\2 Match and capture a character and match it again (even number) (.)\\2匹配并捕获一个字符并再次匹配(偶数)
    • )+ Repeat as much as possible, at least once )+尽可能重复,至少一次
  • ) End of capturing group #1 )捕获组#1结束
  • (?<!\\2\\1) Here is the trick. (?<!\\2\\1)这是诀窍。 The lookbehind tells engine that the immediate preceding character that comes earlier than what we matched so far shouldn't be the same character stored in capturing group #2 lookbehind告诉引擎,比我们到目前为止匹配的前一个字符不应该是存储在捕获组#2中的相同字符
  • (?!\\2) Next character shouldn't be the same as the character stored in capturing group #2 (?!\\2)下一个字符不应与捕获组#2中存储的字符相同

UPDATE: 更新:

So you can do following code in C# to get all even sequences chars in string by Regex with no any other operators (pure Regex). 因此,您可以在C#中执行以下代码,以便通过Regex获取字符串中的所有偶数序列字符,而不使用任何其他运算符(纯正的Regex)。

var allEvenSequences = Regex.Matches("aaabbashasccddee", @"((?>(?(2)(?=\2))(.)\2)+)(?<!\2\1)(?!\2)").Cast<Match>().ToList();

Also if you want to make [bb],[cc],[dd],[ee] then you can join that sequence array: 此外,如果你想制作[bb],[cc],[dd],[ee]那么你可以加入该序列数组:

string strEvenSequences = string.Join(",", allEvenSequence.Select(x => $"[{x}]").ToArray());
//strEvenSequences will be [bb],[cc],[dd],[ee]

Another possible regex-only solution that doesn't involve conditionals: 另一种可能不涉及条件的正则表达式解决方案:

(.)(?<!\1\1)\1(?:\1\1)*(?!\1)

Breakdown: 分解:

(.)         # First capturing group - matches any character.
(?<!\1\1)   # Negative lookbehind - ensures the matched char isn't preceded by the same char.
\1          # Match another one of the character in the 1st group (at least two in total).
(?:\1\1)    # A non-capturing group that matches two occurrences of the same char.
*           # Matches between zero and unlimited times of the previous group.
(?!\1)      # Negative lookahead to make sure no extra occurrence of the char follows.

Demo: 演示:

string input = "aaabbashasccddee";
string pattern = @"(.)(?<!\1\1)\1(?:\1\1)*(?!\1)";
var matches = Regex.Matches(input, pattern);
foreach (Match m in matches)
    Console.WriteLine(m.Value);

Output: 输出:

bb
cc
dd
ee

Try it online . 在线尝试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM