简体   繁体   中英

Pattern matcher returns unexpected values

could you take a look at the pattern (java) in the below result. I expect the result should give me 3 group value, but instead it gave me 2 values only I think i missed something.

package com.mycompany.testapp;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 *
 * @author bo17a
 */
public class NewClass {
    public static void main(String[] args){
        Pattern mPattern = Pattern.compile("^love (.*?) way you (.*?)$");
        Matcher matcher = mPattern.matcher("love the way you lie");
        if(matcher.find()){
            String[] match_groups = new String[matcher.groupCount()];
            System.out.println(String.format("groupCount: %d", matcher.groupCount()));
            for(int j = 0;j<matcher.groupCount();j++){
                System.out.println(String.format("j %d",j));
                match_groups[j] = matcher.group(j);
                System.out.println(match_groups[j]);
            }
        }

    }
}

The result I got is:

2
love the way you lie
the

but my expected result should be:

3
love the way you lie
the
lie

update
i tried to add up one number of group as suggested in replies:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package com.mycompany.testapp;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 *
 * @author lee
 */
public class NewClass {
    public static void main(String[] args){
        Pattern mPattern = Pattern.compile("^love (.*?) way you (.*?)$");
        Matcher matcher = mPattern.matcher("love the way you lie");
        if(matcher.find()){
            String[] match_groups = new String[matcher.groupCount()];
                System.out.println(String.format("groupCount: %d", matcher.groupCount()));
                for(int j = 0;j<=matcher.groupCount();j++){
                    System.out.println(String.format("j %d",j));
                    match_groups[j] = matcher.group(j);
                    System.out.println(match_groups[j]);
                }
        }
    }

}

The result is different from yours:

groupCount: 2
j 0
love the way you lie
j 1
the
j 2
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
    at com.mycompany.testapp.NewClass.main(NewClass.java:24)
Command execution failed.

I have tried on windows 10 JDK 8 Mac OS JDK 8. So, could this be a bug in Java, because you and I have different result for the same code?

My guess is that, you wish to also capture the full match as a group,

^(love (.*?) way you (.*?))$

Or just add 1 to your counter:

Test 1

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegularExpression{

    public static void main(String[] args){

        Pattern mPattern = Pattern.compile("^love (.*?) way you (.*?)$");
        Matcher matcher = mPattern.matcher("love the way you lie");
        if(matcher.find()){
            String[] match_groups = new String[matcher.groupCount() + 1];
                System.out.println(String.format("groupCount: %d", matcher.groupCount() + 1));
                for(int j = 0;j<matcher.groupCount() + 1;j++){
                    System.out.println(String.format("j %d",j));
                    match_groups[j] = matcher.group(j);
                    System.out.println(match_groups[j]);
                }
        }

    }
}

Output

groupCount: 3
j 0
love the way you lie
j 1
the
j 2
lie

Test 2

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegularExpression{

    public static void main(String[] args){

        final String regex = "^(love (.*?) way you (.*?))$";
        final String string = "love the way you lie";

        final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
        final Matcher matcher = pattern.matcher(string);

        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }

    }
}

Output

Full match: love the way you lie
Group 1: love the way you lie
Group 2: the
Group 3: lie

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图像描述

Apparently, matcher.groupCount() returns 2 in your case, so you only construct array of two strings and copy groups with numbers less than 2 into it, which are group 0 (the whole string) and group 1 ("the"). If you add 1 to matcher.groupCount() throughout your whole code, it works as expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM