简体   繁体   English

Python和Java中相同正则表达式的不同行为

[英]Different behavior of same regular expression in Python and Java

Firstly, my apologies as I don't know regular expressions that well. 首先,我道歉,因为我不太了解正则表达式。

I am using a regular expression to match a string. 我使用正则表达式来匹配字符串。 I tested it in Python command line interface, but when I ran it in Java, it produced a different result. 我在Python命令行界面中测试了它,但是当我在Java中运行它时,它产生了不同的结果。

Python execution: Python执行:

re.search("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US", "9.5 D(M) US");

gives the result as: 结果如下:

<_sre.SRE_Match object; span=(0, 11), match='9.5 D(M) US'>

But the Java code 但是Java代码

import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class RegexTest {
    private static final Pattern FALLBACK_MEN_SIZE_PATTERN = Pattern.compile("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US");

    public static void main(String[] args) {
    String strTest = "9.5 D(M) US";
    Matcher matcher = FALLBACK_MEN_SIZE_PATTERN.matcher(strTest);
        if (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

gives the output as: 给出输出为:

5 D(M) US 5 D(M)美国

I don't understand why it is behaving the different way. 我不明白为什么它的行为方式不同。

Here is the pattern that will work the same in Java and Python: 以下是在Java和Python中使用相同的模式:

"[0-9]*(?:\\.[0-9]+)?[^0-9]*D\\([MW]\\)\\s*US"

See Python and Java demos. 请参阅PythonJava演示。

In Python, [\\\\.[0-9]+]? 在Python中, [\\\\.[0-9]+]? is read as 2 subpatterns: [\\.[0-9]+ (1 or more . s, [ s, or digits) and ]? 被读作2子模式: [\\.[0-9]+ (1个或多个. S, [ S,或数字)和]? (0 or 1 ] ). (0或1 ] )。 See how your regex works in Python here . 在这里看看你的正则表达式如何在Python中运行 Or, with more detalization with capturing groups, here . 或者,与捕获组的更多分离, 这里

In Java, it is read as one single character class (ie [ and ] inside are ignored as they cannot be parsed correctly by the regex engine, thus the whole subpattern standing for 0 or 1 . , a digit, or + ) and since it is optional, it was not capturing anything (you can get a visual hint at Visual Regex Tester , type 123.+[] as input and [\\.[0-9]+]? as regex). 在Java中,它读作一个单字符类(即[]内被忽略 ,因为它们不能被正确地由正则表达式引擎解析,因此,整个子模式静置0或1 . ,数字,或+ ),并且由于它是可选的,它没有捕获任何东西(你可以获得Visual Regex Tester的视觉提示,输入123.+[]作为输入, [\\.[0-9]+]?作为正则表达式)。

And a final touch: [M|W] stands for M , | 最后一点: [M|W]代表M| , or W , while I think you meant [MW] = M or W . 或者W ,而我认为你的意思是[MW] = MW

I'm not a Python expert, so I can't tell you why it worked on Python, but in Java, your problem is the [\\\\.[0-9]+]? 我不是Python专家,所以我不能告诉你为什么它适用于Python,但在Java中,你的问题是[\\\\.[0-9]+]? part. 部分。 You probably meant it to be (\\\\.[0-9]+)? 你可能意味着它(\\\\.[0-9]+)? .

As it is, it's a list of characters inside a [] followed by a ? 实际上,它是[]的一个字符列表,后跟一个? . That is, this part of the expression only matches a single or zero character, so it cannot match the .5 together. 也就是说,表达式的这一部分只匹配单个或零个字符,因此它不能与.5匹配。

Here is an illustration of the matching attempts: 以下是匹配尝试的说明:

Java中匹配的图形演示

Now, if your pattern used () instead of [] , this would be the result: 现在,如果您的模式使用()而不是[] ,则结果如下:

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM