計數正則表達式與流匹配

Question

我試圖用簡單的Java 8 lambdas / stream解決方案來計算正則表達式模式的匹配數。 例如，對於此模式/匹配器：

final Pattern pattern = Pattern.compile("\\d+");
final Matcher matcher = pattern.matcher("1,2,3,4");

splitAsStream方法splitAsStream定模式上的文本分割而不是匹配模式。 雖然它很優雅並且保留了不變性，但它並不總是正確的：

// count is 4, correct
final long count = pattern.splitAsStream("1,2,3,4").count();

// count is 0, wrong
final long count = pattern.splitAsStream("1").count();

我也試過（ab）使用IntStream 。 問題是我必須猜測我應該多少次調用matcher.find()而不是它返回false。

final long count = IntStream
        .iterate(0, i -> matcher.find() ? 1 : 0)
        .limit(100)
        .sum();

我熟悉傳統的解決方案while (matcher.find()) count++; count是可變的。 使用Java 8 lambdas / streams有一種簡單的方法嗎？

Answer 1

要正確使用Pattern::splitAsStream ，您必須反轉正則表達式。 這意味着你不應該使用\\\\D+ \\\\d+ （它會在每個數字上分開），而應該使用\\\\D+ 。 這為您提供了String中的編號。

final Pattern pattern = Pattern.compile("\\D+");
// count is 4
long count = pattern.splitAsStream("1,2,3,4").count();
// count is 1
count = pattern.splitAsStream("1").count();

Answer 2

Pattern.splitAsStream的javadoc中相當人為的語言可能是罪魁禍首。

此方法返回的流包含輸入序列的每個子字符串 ，該子字符串 由與此模式匹配的另一個 子序列終止，或者由輸入序列的末尾終止。

如果你打印出1,2,3,4所有匹配，你可能會驚訝地發現它實際上是在返回逗號，而不是數字。

    System.out.println("[" + pattern.splitAsStream("1,2,3,4")
            .collect(Collectors.joining("!")) + "]");

打印[!,!,!,] 。 奇怪的是為什么它給你4而不是3 。

顯然這也解釋了為什么"1"給出0因為字符串中的數字之間沒有字符串。

快速演示：

private void test(Pattern pattern, String s) {
    System.out.println(s + "-[" + pattern.splitAsStream(s)
            .collect(Collectors.joining("!")) + "]");
}

public void test() {
    final Pattern pattern = Pattern.compile("\\d+");
    test(pattern, "1,2,3,4");
    test(pattern, "a1b2c3d4e");
    test(pattern, "1");
}

版畫

1,2,3,4-[!,!,!,]
a1b2c3d4e-[a!b!c!d!e]
1-[]

Answer 3

您可以擴展AbstractSpliterator來解決此問題：

static class SpliterMatcher extends AbstractSpliterator<Integer> {
    private final Matcher m;

    public SpliterMatcher(Matcher m) {
        super(Long.MAX_VALUE, NONNULL | IMMUTABLE);
        this.m = m;
    }

    @Override
    public boolean tryAdvance(Consumer<? super Integer> action) {
        boolean found = m.find();
        if (found)
            action.accept(m.groupCount());
        return found;
    }
}

final Pattern pattern = Pattern.compile("\\d+");

Matcher matcher = pattern.matcher("1");
long count = StreamSupport.stream(new SpliterMatcher(matcher), false).count();
System.out.println("Count: " + count); // 1

matcher = pattern.matcher("1,2,3,4");
count = StreamSupport.stream(new SpliterMatcher(matcher), false).count();
System.out.println("Count: " + count); // 4


matcher = pattern.matcher("foobar");
count = StreamSupport.stream(new SpliterMatcher(matcher), false).count();
System.out.println("Count: " + count); // 0

Answer 4

不久，您有一個stream of String和一個String pattern ：這些字符串中有多少與此模式匹配？

final String myString = "1,2,3,4";
Long count = Arrays.stream(myString.split(","))
      .filter(str -> str.matches("\\d+"))
      .count();

第一行可以是另一種流式傳輸List<String>().stream() ， ...

我錯了嗎？

Answer 5

Java 9

您可以使用Matcher#results()來獲取所有匹配項：

Stream<MatchResult> results()
返回與模式匹配的輸入序列的每個子序列的匹配結果流 。 匹配結果的順序與輸入序列中的匹配子序列的順序相同。

Java 8及更低版本

基於使用反向模式的另一個簡單解決方案：

String pattern = "\\D+";
System.out.println("1".replaceAll("^" + pattern + "|" + pattern + "$", "").split(pattern, 0).length); // => 1

這里，所有非數字都從字符串的開頭和結尾刪除，然后字符串被非數字序列拆分而不報告任何空的尾隨空格元素（因為0作為限制參數傳遞給split ）。

看這個演示：

String pattern = "\\D+";
System.out.println("1".replaceAll("^" + pattern + "|" + pattern + "$", "").split(pattern, 0).length);    // => 1
System.out.println("1,2,3".replaceAll("^" + pattern + "|" + pattern + "$", "").split(pattern, 0).length);// => 3
System.out.println("hz 1".replaceAll("^" + pattern + "|" + pattern + "$", "").split(pattern, 0).length); // => 1
System.out.println("1 hz".replaceAll("^" + pattern + "|" + pattern + "$", "").split(pattern, 0).length); // => 1
System.out.println("xxx 1 223 zzz".replaceAll("^" + pattern + "|" + pattern + "$", "").split(pattern, 0).length);//=>2

計數正則表達式與流匹配

問題描述

5 個解決方案

解決方案1
4 已采納 2015-12-30 15:41:41

解決方案2
3 2015-12-30 15:00:27

解決方案3
3 2015-12-30 15:29:09

解決方案4
1 2015-12-30 14:43:08

解決方案5
0 2015-12-30 21:47:10

Java 9

Java 8及更低版本

計數正則表達式與流匹配

問題描述

5 個解決方案

解決方案1 4 已采納 2015-12-30 15:41:41

解決方案2 3 2015-12-30 15:00:27

解決方案3 3 2015-12-30 15:29:09

解決方案4 1 2015-12-30 14:43:08

解決方案5 0 2015-12-30 21:47:10

Java 9

Java 8及更低版本

解決方案1
4 已采納 2015-12-30 15:41:41

解決方案2
3 2015-12-30 15:00:27

解決方案3
3 2015-12-30 15:29:09

解決方案4
1 2015-12-30 14:43:08

解決方案5
0 2015-12-30 21:47:10