简体   繁体   中英

Reusing the consumed characters in pattern matching in java?

Consider the following Pattern :-

aba

And the foll. source string :-

abababbbaba

01234567890    //Index Positions

Using Pattern and Matcher classes from java.util.regex package, finds this pattern only two times since regex does not consider already consumed characters.

What if I want to reuse a part of already consumed characters. That is, I want 3 matches here, one at position 0, one at 2 (which is ignored previously), and one at 8.

How do I do it??

I think you can use the indexOf () for something like that.

String str = "abababbbaba";
        String substr = "aba";
        int location = 0;
        while ((location = str.indexOf(substr, location)) >= 0)
        {
            System.out.println(location);
            location++;
        }

Prints:

0, 2 and 8

You can use a look ahead for that. Now what you have is the first position in group(1) and the second match in group(2) . Both making each String of length 3 in the sentence you are searching in.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Question8968432 {
    public static void main(String args[]) {
        final String needle = "aba";
        final String sentence = "abababbbaba";
        final Matcher m = Pattern.compile("(.)(?=(..))").matcher(sentence);
        while (m.find()) {
            final String match = m.group(1) + m.group(2);
            final String hint = String.format("%s[%s]%s",
                sentence.substring(0, m.start()), match, 
                sentence.substring(m.start() + match.length()));
            if (match.equals(needle)) {
                System.out.printf("Found %s starting at %d: %s\n", 
                    match, m.start(), hint);
            }
        }
    }
}

Output:

Found aba starting at 0: [aba]babbbaba
Found aba starting at 2: ab[aba]bbbaba
Found aba starting at 8: abababbb[aba]

You can skip the final String hint part, this is just to show you what it matches and where.

如果您可以更改正则表达式,则可以简单地使用以下命令:

a(?=ba)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM