简体   繁体   English

Java正则表达式模式和匹配器

[英]Java regex pattern&matcher

I try to resolve this String using a Mathcer: "2+30*4+(5+6)*7" 我尝试使用Mathcer解析此字符串:“ 2 + 30 * 4 +(5 + 6)* 7”

using this Pattern: "\\d*|[()+*-]" 使用以下模式:“ \\ d * | [()+ *-]”

for some reason, the Matcher splits the string correctly, but when going over the splitted strings, it doesn't divide them correctly, leaving empty strings for anything but the digits: 由于某些原因,匹配器正确地分割了字符串,但是当遍历分割后的字符串时,它没有正确地分割它们,除了数字之外,留下了空字符串:

String s = "2+30*4+(5+6)*7";        
    Pattern p = Pattern.compile("\\d*|[()+*-]");
    Matcher m = p.matcher(s);
    while (m.find()) {
          System.out.print("Start index: " + m.start());
          System.out.print(" End index: " + m.end() + " ");
          System.out.println("-----> " + m.group());
    }

This gives the following output: 这给出以下输出:

Start index: 0 End index: 1 -----> 2
Start index: 1 End index: 1 -----> 
Start index: 2 End index: 4 -----> 30
Start index: 4 End index: 4 -----> 
Start index: 5 End index: 6 -----> 4
Start index: 6 End index: 6 -----> 
Start index: 7 End index: 7 -----> 
Start index: 8 End index: 9 -----> 5
Start index: 9 End index: 9 -----> 
Start index: 10 End index: 11 -----> 6
Start index: 11 End index: 11 -----> 
Start index: 12 End index: 12 -----> 
Start index: 13 End index: 14 -----> 7
Start index: 14 End index: 14 -----> 

I don't understand why, for example in the second line the end index is 1 (and not 2) resulting an empty string: Start index: 1 End index: 1 -----> 我不明白为什么,例如在第二行中,结束索引为1(而不是2),结果是一个空字符串:开始索引:1结束索引:1 ----->

By the way, when I change the pattern's order to "[()+ -]|\\d " it works fine... 顺便说一句,当我将模式的顺序更改为“ [()+ -] | \\ d ”时,它可以正常工作...

Empty strings are allowed by \\\\d* since it means zero or more digits. \\\\d*允许使用空字符串,因为它表示零个或多个数字。 If you don't want to find strings that have zero digits (are empty) change \\\\d* to \\\\d+ . 如果您不想找到数字为零的字符串(为空),请将\\\\d*更改为\\\\d+

Demo 演示

String s = "2+30*4+(5+6)*7";        
Pattern p = Pattern.compile("\\d+|[()+*-]");
Matcher m = p.matcher(s);
while (m.find()) {
      System.out.print("Start index: " + m.start());
      System.out.print(" End index: " + m.end() + " ");
      System.out.println("-----> " + m.group());
}

Output: 输出:

Start index: 0 End index: 1 -----> 2
Start index: 1 End index: 2 -----> +
Start index: 2 End index: 4 -----> 30
Start index: 4 End index: 5 -----> *
Start index: 5 End index: 6 -----> 4
Start index: 6 End index: 7 -----> +
Start index: 7 End index: 8 -----> (
Start index: 8 End index: 9 -----> 5
Start index: 9 End index: 10 -----> +
Start index: 10 End index: 11 -----> 6
Start index: 11 End index: 12 -----> )

If you are not interested in positions of your tokens you can also split before or after each of + - * / ( ) like 如果您对令牌的位置不感兴趣,也可以在每个+ - * / ( )之前或之后进行split ,例如

String s = "2+30*4+(5+6)*7";
String[] tokens = s.split("(?<=[+\\-*/()])|(?=[+\\-*/()])");
for (String token : tokens)
    System.out.println(token);

output: 输出:

2
+
30
*
4
+
(
5
+
6
)
*
7

\\\\d* matches zero or more digits. \\\\d*匹配零个或多个数字。 So after the first match, the matcher is looking at "+30*4+(5+6)*7" , and the first thing the matcher asks is, "Does this string begin with zero or more digits? By golly, yes it does!" 因此,在第一个匹配项之后,匹配项正在查看"+30*4+(5+6)*7" ,匹配项首先询问的是:“此字符串是否以零个或多个数字开头?做到了!” (It checks this first, because \\\\d* appears first in the pattern.) So that's why the matcher is returning an empty string (a string of zero digits). (它首先检查该字符,因为\\\\d*首先出现在模式中。)这就是为什么匹配器返回一个空字符串(零位数的字符串)的原因。

Changing it to \\\\d+ , which matches one or more digits, should work. 将其更改为\\\\d+ ,它可以匹配一个或多个数字。

What you tried with your regix \\\\d*|[()+*-] can be represented as 您对regix \\\\d*|[()+*-]尝试可以表示为

在此处输入图片说明

It matches Zero or more digits. 匹配零个或多个数字。

You need to change it as one or more with the regix \\\\d+|[()+*-] and can be represented as 您需要使用regix \\\\d+|[()+*-]将其更改为一个或多个,并可以表示为

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM