简体   繁体   English

Java Regex不一致的组

[英]Java Regex inconsistent groups

Please Refer to following question on SO: 请参考以下关于SO的问题:

Java: Regex not matching Java:正则表达式不匹配

My Regex groups are not consistent. 我的正则表达式组不一致。 My code looks like: 我的代码如下:

public class RegexTest {

    public static void main(String[] args) {

        // final String VALUES_REGEX = "^\\{([0-9a-zA-Z\\-\\_\\.]+)(?:,\\s*([0-9a-zA-Z\\-\\_\\.]*))*\\}$";
        final String VALUES_REGEX = "\\{([\\w.-]+)(?:, *([\\w.-]+))*\\}";

        final Pattern REGEX_PATTERN = Pattern.compile(VALUES_REGEX);
        final String values = "{df1_apx.fhh.irtrs.d.rrr, ffd1-afp.farr.d.rrr.asgd, ffd2-afp.farr.d.rrr.asgd}";
        final Matcher matcher = REGEX_PATTERN.matcher(values);
        if (null != values && matcher.matches()) {
            // for (int index=1; index<=matcher.groupCount(); ++index) {
            // System.out.println(matcher.group(index));
            // }

            while (matcher.find()) {
                System.out.println(matcher.group());
            }
        }

    }
}

I tried following combinations: 我尝试了以下组合:

A) Regex as "^\\{([0-9a-zA-Z\\-\\_\\.]+)(?:,\\s*([0-9a-zA-Z\\-\\_\\.] )) \\}$" and use groupCount() to iterate. A)正则表达式为“ ^ \\ {([0-9a-zA-Z \\-\\ _ \\。] +)(?:,\\ s *([0-9a-zA-Z \\-\\ _ \\。] ) ) \\} $”,然后使用groupCount()进行迭代。 Result: 结果:

df1_apx.fhh.irtrs.d.rrr

ffd2-afp.farr.d.rrr.asgd

B) Regex as ^\\{([0-9a-zA-Z\\-\\_\\.]+)(?:,\\s*([0-9a-zA-Z\\-\\_\\.] )) \\}$" and use matcher.find(). Result: No result. B)正则表达式为^ \\ {([0-9a-zA-Z \\-\\ _ \\。] +)(?:,\\ s *([0-9a-zA-Z \\-\\ _ \\。] )) \\} $”并使用matcher.find()。结果:无结果。

C) Regex as "\\{([\\w.-]+)(?:, ([\\w.-]+)) \\}" and use groupCount() to iterate. C)正则表达式为“ \\ {([\\ w .-] +)(?:, ([\\ w .-] +)) \\}”,然后使用groupCount()进行迭代。 Result: 结果:

df1_apx.fhh.irtrs.d.rrr

ffd2-afp.farr.d.rrr.asgd

D) Regex as "\\{([\\w.-]+)(?:, ([\\w.-]+)) \\}" and use matcher.find(). D)正则表达式为“ \\ {([\\ w .-] +)(?:, ([\\ w .-] +)) \\}”,并使用matcher.find()。 Result: No results. 结果:无结果。

I never get consistent groups. 我从来没有得到一致的团体。 Expected result here is: 这里的预期结果是:

df1_apx.fhh.irtrs.d.rrr

ffd1-afp.farr.d.rrr.asgd

ffd2-afp.farr.d.rrr.asgd

Please let me know, how can I achieve it. 请让我知道,我该如何实现。

(?<=[{,])\s*(.*?)(?=,|})

You can simply use this and grab the captures.See demo. 您可以简单地使用它并捕获捕获。请参阅演示。

https://regex101.com/r/sJ9gM7/33 https://regex101.com/r/sJ9gM7/33

When you have (#something)* then only the last group is remembered by the regex engine.You wont get all the groups this way. 当您有(#something)* ,正则表达式引擎只会记住最后一个组。您不会以这种方式获得所有组。

The problem is that you are trying to make two things at the same time: 问题是您试图同时做两件事:

  • you want to validate the string format 您要验证字符串格式
  • you want to extract each items (with an unknow number of items) 您要提取每个项目(项目数量未知)

So, it's not possible using the matches method, since when you repeat the same capture group previous captures are overwritten by the last. 因此,不可能使用matchs方法,因为当您重复相同的捕获组时,以前的捕获将被最后一个捕获覆盖。

One possible way is to use the find method to obtain each items and to use the contiguity anchor \\G to check the format. 一种可能的方法是使用find方法获得每个项目,并使用连续性锚\\G检查格式。 \\G ensures that the current match immediatly follows the previous or the start of the string: \\G确保当前匹配项立即跟在字符串的上一个或开头之后:

(?:\\G(?!\\A),\\s*|\\A\\{)([\\w.-]+)(}\\z)?

pattern details: 图案细节:

(?:                  # two possible begins:
    \\G(?!\\A),\\s*  # contiguous to a previous match
                     # (but not at the start of the string)
  |                  # OR
    \\A\\{           # the start of the string
)
([\\w.-]+)           # an item in the capture group 1
(}\\z)?              # the optional capture group 2 to check
                     # that the end of the string has been reached

So to check the format of the string from start to end all you need is to test if the capture group 2 exists for the last match. 因此,要从头到尾检查字符串的格式,您需要测试的是最后一个匹配项是否存在捕获组2。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM