简体   繁体   English

使用正则表达式捕获任意多个组

[英]Capturing Arbitrary Multiple Groups with Regex

I am trying to use regular expressions in Java to capture data from the following string:我正在尝试在 Java 中使用正则表达式从以下字符串中捕获数据:

SettingName = "Value1",0x2,3,"Value4 contains spaces", "Value5 has a space before the string that is ignored"

The string may be prepended with an arbitrary amount of whitespace, which can be ignored.可以在字符串前面添加任意数量的空格,这些空格可以忽略。 It also may contain more or fewer values than listed here, this is just an example.它也可能包含比此处列出的更多或更少的值,这只是一个示例。

My intent is to capture these groups:我的意图是捕捉这些群体:

  • Group 1: SettingName第 1 组:设置名称
  • Group 2: "Value1" (with the quotes)第 2 组:“Value1”(带引号)
  • Group 3: 0x2第 3 组:0x2
  • Group 4: 3第 4 组:3
  • Group 5: "Value4 contains spaces"第 5 组:“Value4 包含空格”
  • Group 6: "Value5 has a space before the string that is ignored"第 6 组:“Value5 在被忽略的字符串之前有一个空格”

  • The regex I am trying to use is:我尝试使用的正则表达式是:

    \s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+
    \s* -> Consume an arbitrary number of whitespace
       ( -> Start a capturing group (group 1)
        [\w\/.-] -> Get a letter of the SettingName, which may be contain alphanumberic, /, ., and -
                + -> Get the previous token one or more times (so group 1 is not blank)
                 ) -> End the capturing group
                  \s* -> Consume an arbitrary amount of whitespace
                     = -> Consume the equals sign
                      (?: -> Start an uncaptured group
                         \s* -> Consume an arbitrary amount of whitespace
                            ( -> Start a captured group
                             \"? -> Consume a quote, if it exists
                                [^\",] -> Consume any nonquote, noncomma character
                                      \"? -> Consume the end quote, if it exists
                                         ) -> End the captured group
                                          (?: -> start a uncaptured group
                                             ,|\s*$ -> capture either a comma or end of line (string?) character
                                                   ) -> end the uncaptured group
                                                    ) -> end the outer uncaptured group
                                                     + -> match the outer uncaptured group 1 or more times
    

    I am using this code:我正在使用这段代码:

    private static final String regex = "\\s*([\\w\\/.-]+)\\s*=(?:\\s*(\"?[^\",]*\"?)(?:,|\\s*$))+";
    private static final Pattern settingPat = Pattern.compile(regex);
    ...
    public String text;
    public Matcher m;
    ...
    public void someMethod(String lineContents)
    {
        m = settingPat.matcher(text);
        if(!m.matches())
            ... (do other stuff)
        else
        {
            name = m.group(1);     // should be "SettingName"
            value[0] = m.group(2); // should be "\"Value1\""
            value[1] = m.group(3); // should be "0x2"
            ...
        }
    }
    

    With this code, it matches to the String, but it seems that I am only capturing the last group.使用此代码,它与字符串匹配,但似乎我只捕获最后一组。 Does Java and/or regular expressions support arbitrary capturing groups repeatedly with a + modifier? Java 和/或正则表达式是否支持使用+修饰符重复任意捕获组?

    You have only 2 capture groups so you cannot get more that 2 groups in the result.您只有 2 个捕获组,因此您无法在结果中获得超过 2 个组。 You will have to run a loop to match all the repetitions你将不得不运行一个循环来匹配所有的重复

    You may use this regex in while loop to get all matches:您可以在while循环中使用此正则表达式来获取所有匹配项:

    (?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)
    

    \G asserts position at the end of the previous match or the start of the string for the first match, since we are using (?!^) we force \G to only match position at the end of the previous match \G断言 position 在上一个匹配的结尾或第一个匹配的字符串的开头,因为我们使用(?!^)我们强制\G在上一个匹配的结尾匹配 position

    RegEx Demo正则表达式演示

    CODE DEMO代码演示

    Code:代码:

    final String regex = "(?:([\\w/.-]+)\\h*=|(?!^)\\G,)\\h*((\"?)[^\",]*\\3)";
    final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";
        
    final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    final Matcher matcher = pattern.matcher(string);
        
    while (matcher.find()) {
        if (matcher.group(1) != null)
            System.out.println(matcher.group(1));
        System.out.println("\t=> " + matcher.group(2));
    }
    

    声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2024 STACKOOM.COM