简体   繁体   中英

JAVA REGEX : How do I find the exact matching group?

Code:

String in = "text2";
Pattern pat = Pattern.compile("((?:text1))|((?:text2))");
Matcher mat = pat.matcher(in);
if(mat.find())
{
     //print the matching group number 
     //without any iteration
     //here the answer is group 2.
}

My pattern is ((?:text1))|((?:text2)) , on matching "text2" with my pattern it gives mat.group(1) as EMPTY STRING and mat.group(2) as text2 .

So my input matches with the matching group number 2 in the pattern.

My question is without any iteration, is there a way to find the exact matching group?

Given a regex (group1)|(group2)|(group3)|...|(groupn) , it is not possible to tell which group matches the text without going through at least (n - 1) groups and check whether it captures some text or is null .

You can, however, reduce the overhead of String construction by calling Matcher.start(int group) and check if the index returned is non-negative (larger or equal to 0).


By the way, this is the source code of Matcher.group(int group) in Oracle's implementation (version 8-b123):

public String group(int group) {
    if (first < 0)
        throw new IllegalStateException("No match found");
    if (group < 0 || group > groupCount())
        throw new IndexOutOfBoundsException("No group " + group);
    if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
        return null;
    return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
}

And compared with Matcher.start(int group) , also Oracle's implementation version 8-b123:

public int start(int group) {
    if (first < 0)
        throw new IllegalStateException("No match available");
    if (group < 0 || group > groupCount())
        throw new IndexOutOfBoundsException("No group " + group);
    return groups[group * 2];
}

Theoretically, it is possible to tell which group matches the text by checking O(log n) capturing groups. You can do it by adding capturing group for group 1 to group (n div 2) and for group (n div 2 + 1) to group n, which will create a search tree. This allows you to search for the group which matches the text, by following the branch that has a match. However, I advise against doing it, since the logic is quite complex and error prone (the group number changes after larger capturing group is added, and the number of groups is not always power of 2).

It is not possible to do this, unfortunately. You could, I suppose, hack it for simple cases like your example, eg:

if (mat.find()) {
    int group = (mat.group(1) == null ? 2 : 1);
}

But this doesn't gain you much, and you'll always have to go through at least n-1 (assumes a match was found) comparisons for n groups (note the above is still 1 group check for 2 groups).

If you don't want to rely on the ordering of groups you could use named capture groups. While this doesn't actually accomplish your goal, it does give you the flexibility of being able to reorder the groups in your regex without having to modify the integer values in your code to match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM