简体   繁体   中英

Odd Regex behaviour in C#

I have made the following regex:

(?<=^PR)(?:[gpr])?([A-Z]{2,3})(?:vB)?(?=\d{4}$)

Which works in any regex testers. However when i try it in C# it acts a bit odd. Let's say i compare to it these 3 strings:

PRPCP2008, PRrSV2012 and PRBP2006

A regex tester matches the following:

PCP, SV and BP

This is what i want to happen. I only care about the 2 or 3 uppercase letters between "PR" and any 4 digit year. I do look for the lower case characters but don't want to match them. Now when i use the same regex in C# i get different matches:

PCP, rSV and BP

PCP and BP are still the same. But now it also includes the lowercase 'r'. Is there a reason why this happens in c#? Or did i just stumble upon a faulty regex tester?

If you'd like to test the regex, i used the following regex tester: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

EDIT: Allright, the code

string regexPattern = @"(?<=^PR)(?:[gpr])?([A-Z]{2,3})(?:vB)?(?=\d{4}$)";

Regex regex = new Regex(regexPattern , RegexOptions.None);
Match match = regex.Match("PRrSV2012");

Console.WriteLine(match.Value);

You are looking at the match value, but you need a group.

var rx = new Regex(@"(?<=^PR)(?:[gpr])?(?'interest'[A-Z]{2,3})(?:vB)?(?=\d{4}$)", RegexOptions.None);

var items = new[] { "PRPCP2008", "PRrSV2012", "PRBP2006", "Foo"};

var results = items.Select(i => new { i, isMatch = rx.IsMatch(i), value = rx.Matches(i).Cast<Match>().Select(m => m.Groups["interest"].Value).FirstOrDefault()});

Result:

PRPCP2008 True PCP 
PRrSV2012 True SV 
PRBP2006 True BP 
Foo False null 

That is the reason i always use explicitly named groups in my expressions.

From what I see, you need to match multiple substrings with 1 regex. Then, you need to un-anchor the pattern (ie remove the ^ and $ ):

(?<=PR)(?:[gpr])?([A-Z]{2,3})(?:vB)?(?=\d{4})

In C#:

var reg = new Regex(@"(?<=PR)(?:[gpr])?([A-Z]{2,3})(?:vB)?(?=\d{4})");
var matches = reg.Matches(str).Cast<Match>().Select(p => p.Value).ToList();

Then, you will have your 3 matches.

See regex demo

在此处输入图片说明

UPDATE

You just need to use .Groups[1].Value to access the SV in PRrSV2012 , see demo .

string regexPattern = @"(?<=^PR)(?:[gpr])?([A-Z]{2,3})(?:vB)?(?=\d{4}$)";
Regex regex = new Regex(regexPattern , RegexOptions.None);
Match match = regex.Match("PRrSV2012");
Console.WriteLine(match.Groups[1].Value);
//                      ^^^^^^^^^

See IDEONE demo

It is not stated directly in MSDN that a Regex.Match object includes capturing groups, but it is implied since Match.Value contains the whole matched text. Capture groups are part of it, thus, should be accessed after the match is found.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM