如何从字符串Regex中提取数据

Question

为什么在这段代码中我必须重复3次正则表达式模式才能找到3个独立的数字？ 我只想使用".*(\\\\d{10}+).*"来查找字符串word中的所有数字，但是我必须重复3次，这为什么我做错了？

    public static void main (String [] args){

    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+","");

    Pattern pat = Pattern.compile(".*(\\d{10}+).*"+".*(\\d{10}+).*"+".*(\\d{10}+).*");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        for (int i = 1; i <= mat.groupCount(); i++) {
            System.out.println(mat.group(i));
        }
    } 

}

Answer 1

这是因为.*是贪婪的模式（请参阅Regex Quantifiers ），这意味着它会在仍然匹配的情况下尝试从字符串中尽可能多地食用。 因此，在您的情况下，它将捕获除最后一个数字以外的所有数字。

为了解决这个问题，您应该摆脱完全匹配模式.* ，因为find已经可以为您提供所有匹配项，并且介于两者之间。

因此，仅使用(\\\\d{10})应该可以。

public static void main (String [] args){
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+","");

    Pattern pat = Pattern.compile("(\\d{10})");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        for (int i = 1; i <= mat.groupCount(); i++) {
            System.out.println(mat.group(i));
        }
    }
}

Answer 2

@Hesham Attia的答案很简单，可以解决您的问题，仅需更多说明它与原始模式的不同之处。

让我们将索引i添加到代码的匹配组中：

    public static void main(String[] args) throws IOException {
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+", "");

    Pattern pat = Pattern.compile("(\\d{10})");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        for (int i = 1; i <= mat.groupCount(); i++) {
            System.out.println("Group-" + i + ": " + mat.group(i));
        }
    }
}

您将得到结果：

组1：0546105610

第1组：4515189675

组1：5467892541

pattern的结果是：

组1：0546105610

2组：4515189675

组3：5467892541

实际上，上面带有新pattern "(\\\\d{10})"的代码等效于以下代码：

    public static void main(String[] args) throws IOException {
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541";
    word = word.replaceAll("\\s+", "");

    Pattern pat = Pattern.compile("\\d{10}");
    Matcher mat = pat.matcher(word);

    while (mat.find()) {
        System.out.println(mat.group());
    }
}

如果您引用Matcher.find(), Matcher.group(), Matcher.groupCount()的javadoc，则会发现Matcher.find()方法尝试查找给定模式的下一个匹配子字符串Matcher.group()返回上一个匹配项，并且Matcher.groupCount()不包括整个匹配项（即组0），仅包括模式中指定的捕获组。

简而言之，正则表达式引擎的工作方式是它将遍历您的模式与主题子序列，并尝试尽可能地匹配（贪婪模式），现在让我们讨论一下这些模式之间的区别：

您的原始格式 ： ".*(\\\\d{10}+).*"+".*(\\\\d{10}+).*"+".*(\\\\d{10}+).*"和why you need repeat it three times
如果仅给出".*(\\\\d{10}+).*" ，则模式将匹配整个字符串，匹配部分为：
- “ Somerandommobilenumbers”匹配标题.*
- “ 0546105610”匹配\\\\d{10}+并进入组1
- “，4515189675,5467892541”匹配拖尾.*
整个字符串已经用于第一次尝试，并且没有任何样式可以再次匹配，您无法提取出第二个和第三个数字，因此您需要重复将样式分成单独的组。
模式 "(\\\\d{10})" ：
每次调用mat.find() ，它将匹配一个数字序列，将其放入组1并返回，然后可以从组1中提取结果，这就是为什么组索引始终为1的原因
模式 "\\\\d{10}" ：
与模式2相同，但不会将匹配结果放入组1，因此您可以直接从mat.group()获得结果，实际上是组0。

Answer 3

真正的问题是您正在使用Pattern ，因为它需要大量代码，因此容易出错。 这是您在1条简单的代码行中的操作方法：

String[] numbers = word.replaceAll("[^\\d,]", "").split(",");

如何从字符串Regex中提取数据

问题描述

3 个解决方案

解决方案1
1 已采纳 2017-03-06 01:59:13

解决方案2
1 2017-03-06 03:27:03

解决方案3
-1 2017-03-06 09:12:34

如何从字符串Regex中提取数据

问题描述

3 个解决方案

解决方案1 1 已采纳 2017-03-06 01:59:13

解决方案2 1 2017-03-06 03:27:03

解决方案3 -1 2017-03-06 09:12:34

解决方案1
1 已采纳 2017-03-06 01:59:13

解决方案2
1 2017-03-06 03:27:03

解决方案3
-1 2017-03-06 09:12:34