简体   繁体   English

如何使用正则表达式解析重复模式

[英]How do I parse recurring pattern with regex

I want to use regex to find unknown number of arguments in a string. 我想使用正则表达式在字符串中查找未知数量的参数。 I think that if I explain it would be hard so let's just see the example: 我认为,如果我解释它会很难,所以让我们看看这个例子:

The regex: @ISNULL\\('(.*?)','(.*?)','(.*?)'\\) 正则表达式: @ISNULL\\('(.*?)','(.*?)','(.*?)'\\)
The String: @ISNULL('1','2','3') 字符串: @ISNULL('1','2','3')
The result: 结果:

Group[0] "@ISNULL('1','2','3')" at 0 - 20 
Group[1] "1" at 9 - 10 
Group[2] "2" at 13 - 14  
Group[3] "3" at 17 - 18  

That's working great. 那工作得很好。 The problem begins when I need to find unknown number of arguments (2 and more). 当我需要找到未知数量的参数(2和更多)时,问题就开始了。

What changes do I need to do to the regex in order to find all the arguments that will occur in the string? 我需要对正则表达式进行哪些更改才能找到字符串中将出现的所有参数?

So, if I parse this string "@ISNULL('1','2','3','4','5','6')" I'll find all the arguments. 所以,如果我解析这个字符串"@ISNULL('1','2','3','4','5','6')"我会找到所有的参数。

If you don't know the number of potential matches in a repeated construct, you need a regex engine that supports captures in addition to capturing groups. 如果您不知道重复构造中潜在匹配的数量, 除了捕获组之外,还需要一个支持捕获正则表达式引擎 Only .NET and Perl 6 offer this currently. 目前只有.NET和Perl 6提供此功能。

In C#: 在C#中:

  string pattern = @"@ISNULL\(('([^']*)',?)+\)";
  string input = @"@ISNULL('1','2','3','4','5','6')";
  Match match = Regex.Match(input, pattern);
  if (match.Success) {
     Console.WriteLine("Matched text: {0}", match.Value);
     for (int ctr = 1; ctr < match.Groups.Count; ctr++) {
        Console.WriteLine("   Group {0}:  {1}", ctr, match.Groups[ctr].Value);
        int captureCtr = 0;
        foreach (Capture capture in match.Groups[ctr].Captures) {
           Console.WriteLine("      Capture {0}: {1}", 
                             captureCtr, capture.Value);
           captureCtr++; 
        }
     }
  }   

In other regex flavors, you have to do it in two steps. 在其他正则表达式中,您必须分两步完成。 Eg, in Java (code snippets courtesy of RegexBuddy ): 例如,在Java中(代码片段由RegexBuddy提供 ):

First, find the part of the string you need: 首先,找到所需字符串的一部分:

Pattern regex = Pattern.compile("@ISNULL\\(('([^']*)',?)+\\)");
// or, using non-capturing groups: 
// Pattern regex = Pattern.compile("@ISNULL\\((?:'(?:[^']*)',?)+\\)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group();
} 

Then use another regex to find and iterate over your matches: 然后使用另一个正则表达式查找并迭代您的匹配:

List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile("'([^']*)'");
    Matcher regexMatcher = regex.matcher(ResultString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group(1));
    } 

This answer is somewhat speculative as i have no clue what regex engine you are using. 这个答案有点推测,因为我不知道你正在使用什么正则表达式引擎。 If the parameters are always numbers and always enclosed in single quotes, then why don't you try using the digit class like this: 如果参数始终是数字并且始终用单引号括起来,那么为什么不尝试使用这样的数字类:

'(\d)+?'

This is just the \\d class and the extraneous @ISNULL stuff removed, as i assume you are only interested in the parameters themselves. 这只是\\d类和无关的@ISNULL东西被删除,因为我假设你只对参数本身感兴趣。 You may not need the + and of course i don't know whether the engine you are using supports the lazy ? 您可能不需要+ ,当然我不知道您使用的引擎是否支持懒惰? operator, just give it a go. 操作员,试试吧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM