简体   繁体   中英

How do I find the Nth occurrence of a pattern with regex?

I have a string of numbers separated by some non-numeric character like this: "16-15316-273" Is it possible to build regex expression the way it returns me Nth matching group? I heard that ${n} might help, but it does not work for me at least in this expression:

// Example: I want to get 15316
var r = new Regex(@"(\d+)${1}");
var m = r.Match("16-15316-273");

(\\d+)${0} returns 16, but (\\d+)${1} gives me 273 instead of expected 15316

So N which is order of pattern needed to be extracted and input string itself ("16-15316-273" is just an example) are dynamic values which might change during app execution. The task is to build regex expression the way where the only thing changed inside it is N, and to be applicable to any such string.

Please do not offer solutions with any additional c# code like m.Groups[n] or Split , I'm intentionally asking for building proper Regex pattern for that. In short, I can not modify the code for every new N value, all I can modify is regex expression which is built dynamically, N will be passed as a parameter to the method. All the rest is static, no way to change it.

Maybe this expression will help you?

(?<=(\d+[^\d]+){1})\d+

You will need to modify {1} according to your NIe

(?<=(\d+[^\d]+){0})\d+ => 16
(?<=(\d+[^\d]+){1})\d+ => 15316
(?<=(\d+[^\d]+){2})\d+ => 273

Your regular expression

(\d+)${1}

says to match this:

  • (\\d+) : match 1 or more decimal digits, followed by
  • ${1} : match the atomic zero-width assertion "end of input string" exactly once .

One should note that the {1} quantifier is redundant since there's normally only one end-of-input-string (unless you've turned on the multiline option).

That's why you're matching `273': it's the longest sequence of digits anchored at end-of-string.

You need to use a zero-width positive look-behind assertion . To capture the Nth field in your string, you need to capture that string of digits that is preceded by N-1 fields. Given this source string:

string input = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;

The regular expression to match the 3rd field, where the first field is 1 rather than 0 looks like this:

(?<=^(\d+(-|$)){2})\d+

It says to

  • match the longest sequence of digits that is preceded by
    • start of text, followed by
    • a group, consisting of
      • 1 or more decimal digits, followed by
      • either a - or end-of-text
    • with that group repeated exactly 2 times

Here's a sample program:

string src = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
for ( int n = 1 ; n <= 10  ; ++n )
{
  int n1       = n-1 ;
  string x     = n1.ToString(CultureInfo.InvariantCulture) ;
  string regex = @"(?<=^(\d+(-|$)){"+ x + @"})\d+" ;

  Console.Write( "regex: {0} ",regex);

  Regex rx = new Regex( regex ) ;
  Match m = rx.Match( src ) ;
  Console.WriteLine( "N={0,-2}, N-1={1,-2}, {2}" ,
    n ,
    n1 ,
    m.Success ? "success: " + m.Value : "failure" 
    ) ;
}

It produces this output:

regex: (?<=^(\d+(-|$)){0})\d+ N= 1, N-1=0 , success: 1
regex: (?<=^(\d+(-|$)){1})\d+ N= 2, N-1=1 , success: 22
regex: (?<=^(\d+(-|$)){2})\d+ N= 3, N-1=2 , success: 333
regex: (?<=^(\d+(-|$)){3})\d+ N= 4, N-1=3 , success: 4444
regex: (?<=^(\d+(-|$)){4})\d+ N= 5, N-1=4 , success: 55555
regex: (?<=^(\d+(-|$)){5})\d+ N= 6, N-1=5 , success: 666666
regex: (?<=^(\d+(-|$)){6})\d+ N= 7, N-1=6 , success: 7777777
regex: (?<=^(\d+(-|$)){7})\d+ N= 8, N-1=7 , success: 88888888
regex: (?<=^(\d+(-|$)){8})\d+ N= 9, N-1=8 , success: 999999999
regex: (?<=^(\d+(-|$)){9})\d+ N=10, N-1=9 , failure

Try this:

string text = "16-15316-273";
Regex r = new Regex(@"\d+");
var m = r.Match(text, text.IndexOf('-'));

The output is 15316 ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM