简体   繁体   中英

Java Regex. group excluding delimiters

I'm trying to split my string using regex. It should include even zero-length matches before and after every delimiter. For example, if delimiter is ^ and my string is ^^^ I expect to get to get 4 zero-length groups. I can not use just regex = "([^\\\\^]*)" because it will include extra zero-length matches after every true match between delimiters. So I have decided to use not-delimiter symbols following after beginning of line or after delimiter. It works perfect on https://regex101.com/ (I'm sorry, i couldn't find a share option on this web-site to share my example) but in Intellij IDEa it skips one match.

So, now my code is:

final String regex = "(^|\\^)([^\\^]*)";
final String string = "^^^^";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) 
    System.out.println("[" + matcher.start(2) + "-" + matcher.end(2) + "]: \"" + matcher.group(2) + "\"");

and I expect 5 empty-string matches. But I have only 4:

[0-0]: ""
[2-2]: ""
[3-3]: ""
[4-4]: ""

The question is why does it skip [1-1] match and how can I fix it?

Your regex matches either the start of string or a ^ (capturing that into Group 1) and then any 0+ chars other than ^ into Group 2. When the first match is found (the start of the string), the first group keeps an empty string (as it is the start of string) and Group 2 also holds an empty string (as the first char is ^ and [^^]* can match an empty string before a non-matching char. The whole match is zero-length, and the regex engine moves the regex index to the next position. So, after the first match, the regex index is moved from the start of the string to the position after the first ^ . Then, the second match is found, the second ^ and the empty string after it. Hence, the the first ^ is not matched, it is skipped.

The solution is a simple split one:

String[] result = string.split("\\^", -1);

The second argument makes the method output all empty matches at the end of the resulting array.

See a Java demo :

String str = "^^^^";
String[] result = str.split("\\^", -1);
System.out.println("Number of items: " + result.length);
for (String s: result) {
    System.out.println("\"" + s+ "\"");
}

Output:

Number of items: 5
""
""
""
""
""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM