Java正则表达式模式和匹配器

Question

I try to resolve this String using a Mathcer: "2+30*4+(5+6)*7" 我尝试使用Mathcer解析此字符串：“ 2 + 30 * 4 +（5 + 6）* 7”

using this Pattern: "\\d*|[()+*-]" 使用以下模式：“ \\ d * | [（）+ *-]”

for some reason, the Matcher splits the string correctly, but when going over the splitted strings, it doesn't divide them correctly, leaving empty strings for anything but the digits: 由于某些原因，匹配器正确地分割了字符串，但是当遍历分割后的字符串时，它没有正确地分割它们，除了数字之外，留下了空字符串：

String s = "2+30*4+(5+6)*7";        
    Pattern p = Pattern.compile("\\d*|[()+*-]");
    Matcher m = p.matcher(s);
    while (m.find()) {
          System.out.print("Start index: " + m.start());
          System.out.print(" End index: " + m.end() + " ");
          System.out.println("-----> " + m.group());
    }

This gives the following output: 这给出以下输出：

Start index: 0 End index: 1 -----> 2
Start index: 1 End index: 1 -----> 
Start index: 2 End index: 4 -----> 30
Start index: 4 End index: 4 -----> 
Start index: 5 End index: 6 -----> 4
Start index: 6 End index: 6 -----> 
Start index: 7 End index: 7 -----> 
Start index: 8 End index: 9 -----> 5
Start index: 9 End index: 9 -----> 
Start index: 10 End index: 11 -----> 6
Start index: 11 End index: 11 -----> 
Start index: 12 End index: 12 -----> 
Start index: 13 End index: 14 -----> 7
Start index: 14 End index: 14 ----->

I don't understand why, for example in the second line the end index is 1 (and not 2) resulting an empty string: Start index: 1 End index: 1 -----> 我不明白为什么，例如在第二行中，结束索引为1（而不是2），结果是一个空字符串：开始索引：1结束索引：1 ----->

By the way, when I change the pattern's order to "[()+ -]|\\d " it works fine... 顺便说一句，当我将模式的顺序更改为“ [（）+ -] | \\ d ”时，它可以正常工作...

Answer 1

Empty strings are allowed by \\\\d* since it means zero or more digits. \\\\d*允许使用空字符串，因为它表示零个或多个数字。 If you don't want to find strings that have zero digits (are empty) change \\\\d* to \\\\d+ . 如果您不想找到数字为零的字符串（为空），请将\\\\d*更改为\\\\d+ 。

Demo 演示

String s = "2+30*4+(5+6)*7";        
Pattern p = Pattern.compile("\\d+|[()+*-]");
Matcher m = p.matcher(s);
while (m.find()) {
      System.out.print("Start index: " + m.start());
      System.out.print(" End index: " + m.end() + " ");
      System.out.println("-----> " + m.group());
}

Output: 输出：

Start index: 0 End index: 1 -----> 2
Start index: 1 End index: 2 -----> +
Start index: 2 End index: 4 -----> 30
Start index: 4 End index: 5 -----> *
Start index: 5 End index: 6 -----> 4
Start index: 6 End index: 7 -----> +
Start index: 7 End index: 8 -----> (
Start index: 8 End index: 9 -----> 5
Start index: 9 End index: 10 -----> +
Start index: 10 End index: 11 -----> 6
Start index: 11 End index: 12 -----> )

If you are not interested in positions of your tokens you can also split before or after each of + - * / ( ) like 如果您对令牌的位置不感兴趣，也可以在每个+ - * / ( )之前或之后进行split ，例如

String s = "2+30*4+(5+6)*7";
String[] tokens = s.split("(?<=[+\\-*/()])|(?=[+\\-*/()])");
for (String token : tokens)
    System.out.println(token);

output: 输出：

2
+
30
*
4
+
(
5
+
6
)
*
7

Answer 2

\\\\d* matches zero or more digits. \\\\d*匹配零个或多个数字。 So after the first match, the matcher is looking at "+30*4+(5+6)*7" , and the first thing the matcher asks is, "Does this string begin with zero or more digits? By golly, yes it does!" 因此，在第一个匹配项之后，匹配项正在查看"+30*4+(5+6)*7" ，匹配项首先询问的是：“此字符串是否以零个或多个数字开头？做到了！” (It checks this first, because \\\\d* appears first in the pattern.) So that's why the matcher is returning an empty string (a string of zero digits). （它首先检查该字符，因为\\\\d*首先出现在模式中。）这就是为什么匹配器返回一个空字符串（零位数的字符串）的原因。

Changing it to \\\\d+ , which matches one or more digits, should work. 将其更改为\\\\d+ ，它可以匹配一个或多个数字。

Answer 3

What you tried with your regix \\\\d*|[()+*-] can be represented as 您对regix \\\\d*|[()+*-]尝试可以表示为

在此处输入图片说明

It matches Zero or more digits. 匹配零个或多个数字。

You need to change it as one or more with the regix \\\\d+|[()+*-] and can be represented as 您需要使用regix \\\\d+|[()+*-]将其更改为一个或多个，并可以表示为

在此处输入图片说明

Java正则表达式模式和匹配器

问题描述

3 个解决方案

解决方案1
2 已采纳 2013-12-09 23:58:40

解决方案2
1 2013-12-10 00:00:20

解决方案3
1 2013-12-10 03:12:30

Java正则表达式模式和匹配器

问题描述

3 个解决方案

解决方案1 2 已采纳 2013-12-09 23:58:40

解决方案2 1 2013-12-10 00:00:20

解决方案3 1 2013-12-10 03:12:30

解决方案1
2 已采纳 2013-12-09 23:58:40

解决方案2
1 2013-12-10 00:00:20

解决方案3
1 2013-12-10 03:12:30