简体   繁体   中英

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

Example - I'm trying to have the expression match all the b characters following any number of a characters:

Regex expression = new Regex("(?<=a).*");

foreach (Match result in expression.Matches("aaabbbb"))
  MessageBox.Show(result.Value);

returns aabbbb , the lookbehind matching only an a . How can I make it so that it would match all the a s in the beginning?

I've tried

Regex expression = new Regex("(?<=a+).*");

and

Regex expression = new Regex("(?<=a)+.*");

with no results...

What I'm expecting is bbbb .

Are you looking for a repeated capturing group?

(.)\1*

This will return two matches.

Given:

aaabbbb

This will result in:

aaa
bbbb

This:

(?<=(.))(?!\1).*

Uses the above principal, first checking that the finding the previous character, capturing it into a back reference, and then asserting that that character is not the next character.

That matches:

bbbb

I figured it out eventually:

Regex expression = new Regex("(?<=a+)[^a]+");

foreach (Match result in expression.Matches(@"aaabbbb"))
   MessageBox.Show(result.Value);

I must not allow the a s to me matched by the non-lookbehind group. This way, the expression will only match those b repetitions that follow a repetitions.

Matching aaabbbb yields bbbb and matching aaabbbbcccbbbbaaaaaabbzzabbb results in bbbbcccbbbb , bbzz and bbb .

The reason the look-behind is skipping the "a" is because it is consuming the first "a" (but no capturing it), then it captures the rest.

Would this pattern work for you instead? New pattern: \\ba+(.+)\\b It uses a word boundary \\b to anchor either ends of the word. It matches at least one "a" followed by the rest of the characters till the word boundary ends. The remaining characters are captured in a group so you can reference them easily.

string pattern = @"\ba+(.+)\b";

foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
    Console.WriteLine("Match: " + m.Value);
    Console.WriteLine("Group capture: " + m.Groups[1].Value);
}

UPDATE: If you want to skip the first occurrence of any duplicated letters, then match the rest of the string, you could do this:

string pattern = @"\b(.)(\1)*(?<Content>.+)\b";

foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
    Console.WriteLine("Match: " + m.Value);
    Console.WriteLine("Group capture: " + m.Groups["Content"].Value);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM