简体   繁体   English

相同正则表达式的Pattern / Matcher vs String.split()

[英]Pattern/Matcher vs String.split() for the same regex

Why does Pattern/Matcher work with (\\\\d+)([a-zA-Z]+) but String.split() doesn't ? 为什么Pattern / Matcher与(\\\\d+)([a-zA-Z]+)但是String.split()不起作用?

For example: 例如:

String line = "1A2B";

Pattern p = Pattern.compile("(\\d+)([a-zA-Z]+)");
Matcher m = p.matcher(line);
System.out.println(m.groupCount());

while(m.find())
{
    System.out.println(m.group());
}

Prints : 版画:

2
1A
2B

But : 但是:

    String line = "1A2B";
    String [] arrayOfStrings = line.split("(\\d+)([a-zA-Z]+)");
    System.out.println(arrayOfStrings.length);

    for(String elem: arrayOfStrings){
        System.out.println(elem);
    }

Prints only: 仅打印:

0

That is because the .split(String regex) uses the regular expression to mark where to break the string. 这是因为.split(String regex)使用正则表达式来标记要在何处断开字符串。 So, in your case, if you have 1A2B£$%^& it will print 1 string: £$%^& because it will split at 1A and then again at 2B , however, since those return empty groups, they are omitted and you are left with just £$%^& . 因此,在您的情况下,如果您有1A2B£$%^&它将打印1个字符串: £$%^&因为它将在1A处拆分,然后在2B处再次拆分,但是,由于这些返回空组,因此将它们省略并您只剩下£$%^&

On the other hand, the regular expression does is that it matches the strings and puts them into groups. 另一方面,正则表达式的作用是匹配字符串并将它们分组。 These groups can then be accessed at a later stage as you are doing. 然后,您可以在以后的阶段访问这些组。

Why it didnt work 为什么它不起作用

Because the spit would consume those characters and there is no character left to be in the output list 因为吐痰会消耗这些字符,并且输出列表中没有剩余字符

Solution

Not perfect but look aheads will help you 并不完美,但展望未来将为您提供帮助

String line = "1A2B";
String [] arrayOfStrings = line.split("(?=\\d+[a-zA-Z]+)");
System.out.println(arrayOfStrings.length);

for(String elem: arrayOfStrings){
    System.out.println(elem);

will give output as 将输出为

3

1A
2B

Not perfect becuase the look ahead will be true at the start of the string, thus creating an empty string in the output list at index 0. In the example you can see that the length is 3 where as we expect 2 并不是很完美,因为前瞻在字符串的开头是正确的,因此在输出列表中的索引0处创建了一个空字符串。在示例中,您可以看到长度为3,而我们期望的是2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM