简体   繁体   English

如何计算正则表达式的匹配次数?

[英]How can I count the number of matches for a regex?

Let's say I have a string which contains this:假设我有一个包含以下内容的字符串:

HelloxxxHelloxxxHello

I compile a pattern to look for 'Hello'我编译了一个模式来寻找“你好”

Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");

It should find three matches.它应该找到三个匹配项。 How can I get a count of how many matches there were?我怎样才能计算出有多少场比赛?

I've tried various loops and using the matcher.groupCount() but it didn't work.我尝试了各种循环并使用matcher.groupCount()但它没有用。

matcher.find() does not find all matches, only the next match. matcher.find()不会找到所有匹配项,只会找到下一个匹配项。

Solution for Java 9+ Java 9+ 的解决方案

long matches = matcher.results().count();

Solution for Java 8 and older Java 8 及更早版本的解决方案

You'll have to do the following.您必须执行以下操作。 ( Starting from Java 9, there is a nicer solution ) 从 Java 9 开始,有一个更好的解决方案

int count = 0;
while (matcher.find())
    count++;

Btw, matcher.groupCount() is something completely different.顺便说一句, matcher.groupCount()是完全不同的东西。

Complete example :完整示例

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String hello = "HelloxxxHelloxxxHello";
        Pattern pattern = Pattern.compile("Hello");
        Matcher matcher = pattern.matcher(hello);

        int count = 0;
        while (matcher.find())
            count++;

        System.out.println(count);    // prints 3
    }
}

Handling overlapping matches处理重叠匹配

When counting matches of aa in aaaa the above snippet will give you 2 .当计算aaaaaa匹配时,上面的代码片段会给你2

aaaa
aa
  aa

To get 3 matches, ie this behavior:要获得 3 个匹配项,即此行为:

aaaa
aa
 aa
  aa

You have to search for a match at index <start of last match> + 1 as follows:您必须在索引<start of last match> + 1处搜索匹配项,如下所示:

String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);

int count = 0;
int i = 0;
while (matcher.find(i)) {
    count++;
    i = matcher.start() + 1;
}

System.out.println(count);    // prints 3

This should work for matches that might overlap:这应该适用于可能重叠的匹配:

public static void main(String[] args) {
    String input = "aaaaaaaa";
    String regex = "aa";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);
    int from = 0;
    int count = 0;
    while(matcher.find(from)) {
        count++;
        from = matcher.start() + 1;
    }
    System.out.println(count);
}

从 Java 9 开始,您可以使用Matcher.results()提供的流

long matches = matcher.results().count();

If you want to use Java 8 streams and are allergic to while loops, you could try this:如果你想使用 Java 8 流并且对while循环过敏,你可以试试这个:

public static int countPattern(String references, Pattern referencePattern) {
    Matcher matcher = referencePattern.matcher(references);
    return Stream.iterate(0, i -> i + 1)
            .filter(i -> !matcher.find())
            .findFirst()
            .get();
}

Disclaimer: this only works for disjoint matches.免责声明:这仅适用于不相交的匹配。

Example:例子:

public static void main(String[] args) throws ParseException {
    Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
    System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
    System.out.println(countPattern("[  ]", referencePattern));
}

This prints out:这打印出来:

2
0
1
0

This is a solution for disjoint matches with streams:这是与流不相交匹配的解决方案:

public static int countPattern(String references, Pattern referencePattern) {
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
            new Iterator<Integer>() {
                Matcher matcher = referencePattern.matcher(references);
                int from = 0;

                @Override
                public boolean hasNext() {
                    return matcher.find(from);
                }

                @Override
                public Integer next() {
                    from = matcher.start() + 1;
                    return 1;
                }
            },
            Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}

Use the below code to find the count of number of matches that the regex finds in your input使用以下代码查找正则表达式在您的输入中找到的匹配数

        Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);// "regex" here indicates your predefined regex.
        Matcher m = p.matcher(pattern); // "pattern" indicates your string to match the pattern against with
        boolean b = m.matches();
        if(b)
        count++;
        while (m.find())
        count++;

This is a generalized code not specific one though, tailor it to suit your need这是一个通用代码而不是特定代码,请根据您的需要进行定制

Please feel free to correct me if there is any mistake.如果有任何错误,请随时纠正我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM