简体   繁体   English

如何使用正则表达式找到模式的第N次出现?

[英]How do I find the Nth occurrence of a pattern with regex?

I have a string of numbers separated by some non-numeric character like this: "16-15316-273" Is it possible to build regex expression the way it returns me Nth matching group? 我有一个用一些非数字字符分隔的数字字符串,例如:“ 16-15316-273”,是否可以按返回第N个匹配组的方式构建正则表达式? I heard that ${n} might help, but it does not work for me at least in this expression: 我听说${n}可能有帮助,但至少在以下表达式中对我不起作用:

// Example: I want to get 15316
var r = new Regex(@"(\d+)${1}");
var m = r.Match("16-15316-273");

(\\d+)${0} returns 16, but (\\d+)${1} gives me 273 instead of expected 15316 (\\d+)${0}返回16,但(\\d+)${1}给我273,而不是预期的15316

So N which is order of pattern needed to be extracted and input string itself ("16-15316-273" is just an example) are dynamic values which might change during app execution. 因此,需要提取模式顺序的N和输入字符串本身(“ 16-15316-273”只是示例),它们是动态值,可能会在应用执行期间发生变化。 The task is to build regex expression the way where the only thing changed inside it is N, and to be applicable to any such string. 任务是构建正则表达式,使其内部唯一发生变化的方式是N,并将其应用于任何此类字符串。

Please do not offer solutions with any additional c# code like m.Groups[n] or Split , I'm intentionally asking for building proper Regex pattern for that. 请不要提供带有任何其他c#代码的解决方案,例如m.Groups[n]Split ,我有意为此要求构建适当的Regex模式。 In short, I can not modify the code for every new N value, all I can modify is regex expression which is built dynamically, N will be passed as a parameter to the method. 简而言之,我无法为每个新的N值修改代码,我只能修改动态生成的regex表达式,N将作为参数传递给方法。 All the rest is static, no way to change it. 其余所有内容都是静态的,无法更改。

Maybe this expression will help you? 也许这种表达方式对您有帮助?

(?<=(\d+[^\d]+){1})\d+

You will need to modify {1} according to your NIe 您将需要根据您的NIe修改{1}

(?<=(\d+[^\d]+){0})\d+ => 16
(?<=(\d+[^\d]+){1})\d+ => 15316
(?<=(\d+[^\d]+){2})\d+ => 273

Your regular expression 您的正则表达式

(\d+)${1}

says to match this: 说要匹配:

  • (\\d+) : match 1 or more decimal digits, followed by (\\d+) :匹配1个或多个十进制数字,后跟
  • ${1} : match the atomic zero-width assertion "end of input string" exactly once . ${1}匹配原子零宽度断言“输入字符串的末尾” 正好一次

One should note that the {1} quantifier is redundant since there's normally only one end-of-input-string (unless you've turned on the multiline option). 应该注意的是, {1} 量词是多余的,因为通常只有一个输入字符串结尾(除非您已打开多行选项)。

That's why you're matching `273': it's the longest sequence of digits anchored at end-of-string. 这就是为什么要匹配“ 273”的原因:这是锚定在字符串末尾的最长数字序列。

You need to use a zero-width positive look-behind assertion . 您需要使用零宽度的正向后断言 To capture the Nth field in your string, you need to capture that string of digits that is preceded by N-1 fields. 捕获字符串中的第N个字段,您需要捕获在N-1个字段之后的数字字符串。 Given this source string: 给定此源字符串:

string input = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;

The regular expression to match the 3rd field, where the first field is 1 rather than 0 looks like this: 匹配第三个字段的正则表达式,其中第一个字段是1而不是0看起来像这样:

(?<=^(\d+(-|$)){2})\d+

It says to 它说

  • match the longest sequence of digits that is preceded by 匹配前面的最长数字序列
    • start of text, followed by 文字开头,然后是
    • a group, consisting of 一组,由
      • 1 or more decimal digits, followed by 1个或多个十进制数字,后跟
      • either a - or end-of-text -或文本结尾
    • with that group repeated exactly 2 times 该组重复了2次

Here's a sample program: 这是一个示例程序:

string src = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
for ( int n = 1 ; n <= 10  ; ++n )
{
  int n1       = n-1 ;
  string x     = n1.ToString(CultureInfo.InvariantCulture) ;
  string regex = @"(?<=^(\d+(-|$)){"+ x + @"})\d+" ;

  Console.Write( "regex: {0} ",regex);

  Regex rx = new Regex( regex ) ;
  Match m = rx.Match( src ) ;
  Console.WriteLine( "N={0,-2}, N-1={1,-2}, {2}" ,
    n ,
    n1 ,
    m.Success ? "success: " + m.Value : "failure" 
    ) ;
}

It produces this output: 它产生以下输出:

regex: (?<=^(\d+(-|$)){0})\d+ N= 1, N-1=0 , success: 1
regex: (?<=^(\d+(-|$)){1})\d+ N= 2, N-1=1 , success: 22
regex: (?<=^(\d+(-|$)){2})\d+ N= 3, N-1=2 , success: 333
regex: (?<=^(\d+(-|$)){3})\d+ N= 4, N-1=3 , success: 4444
regex: (?<=^(\d+(-|$)){4})\d+ N= 5, N-1=4 , success: 55555
regex: (?<=^(\d+(-|$)){5})\d+ N= 6, N-1=5 , success: 666666
regex: (?<=^(\d+(-|$)){6})\d+ N= 7, N-1=6 , success: 7777777
regex: (?<=^(\d+(-|$)){7})\d+ N= 8, N-1=7 , success: 88888888
regex: (?<=^(\d+(-|$)){8})\d+ N= 9, N-1=8 , success: 999999999
regex: (?<=^(\d+(-|$)){9})\d+ N=10, N-1=9 , failure

Try this: 尝试这个:

string text = "16-15316-273";
Regex r = new Regex(@"\d+");
var m = r.Match(text, text.IndexOf('-'));

The output is 15316 ;) 输出为15316;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM