[英]Different behavior of same regular expression in Python and Java
Firstly, my apologies as I don't know regular expressions that well. 首先,我道歉,因为我不太了解正则表达式。
I am using a regular expression to match a string. 我使用正则表达式来匹配字符串。 I tested it in Python command line interface, but when I ran it in Java, it produced a different result. 我在Python命令行界面中测试了它,但是当我在Java中运行它时,它产生了不同的结果。
Python execution: Python执行:
re.search("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US", "9.5 D(M) US");
gives the result as: 结果如下:
<_sre.SRE_Match object; span=(0, 11), match='9.5 D(M) US'>
But the Java code 但是Java代码
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegexTest {
private static final Pattern FALLBACK_MEN_SIZE_PATTERN = Pattern.compile("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US");
public static void main(String[] args) {
String strTest = "9.5 D(M) US";
Matcher matcher = FALLBACK_MEN_SIZE_PATTERN.matcher(strTest);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
gives the output as: 给出输出为:
5 D(M) US 5 D(M)美国
I don't understand why it is behaving the different way. 我不明白为什么它的行为方式不同。
Here is the pattern that will work the same in Java and Python: 以下是在Java和Python中使用相同的模式:
"[0-9]*(?:\\.[0-9]+)?[^0-9]*D\\([MW]\\)\\s*US"
See Python and Java demos. 请参阅Python和Java演示。
In Python, [\\\\.[0-9]+]?
在Python中, [\\\\.[0-9]+]?
is read as 2 subpatterns: [\\.[0-9]+
(1 or more .
s, [
s, or digits) and ]?
被读作2子模式: [\\.[0-9]+
(1个或多个.
S, [
S,或数字)和]?
(0 or 1 ]
). (0或1 ]
)。 See how your regex works in Python here . 在这里看看你的正则表达式如何在Python中运行 。 Or, with more detalization with capturing groups, here . 或者,与捕获组的更多分离, 这里 。
In Java, it is read as one single character class (ie [
and ]
inside are ignored as they cannot be parsed correctly by the regex engine, thus the whole subpattern standing for 0 or 1 .
, a digit, or +
) and since it is optional, it was not capturing anything (you can get a visual hint at Visual Regex Tester , type 123.+[]
as input and [\\.[0-9]+]?
as regex). 在Java中,它读作一个单字符类(即[
和]
内被忽略 ,因为它们不能被正确地由正则表达式引擎解析,因此,整个子模式静置0或1 .
,数字,或+
),并且由于它是可选的,它没有捕获任何东西(你可以获得Visual Regex Tester的视觉提示,输入123.+[]
作为输入, [\\.[0-9]+]?
作为正则表达式)。
And a final touch: [M|W]
stands for M
, |
最后一点: [M|W]
代表M
, |
, or W
, while I think you meant [MW]
= M
or W
. 或者W
,而我认为你的意思是[MW]
= M
或W
I'm not a Python expert, so I can't tell you why it worked on Python, but in Java, your problem is the [\\\\.[0-9]+]?
我不是Python专家,所以我不能告诉你为什么它适用于Python,但在Java中,你的问题是[\\\\.[0-9]+]?
part. 部分。 You probably meant it to be (\\\\.[0-9]+)?
你可能意味着它(\\\\.[0-9]+)?
. 。
As it is, it's a list of characters inside a []
followed by a ?
实际上,它是[]
的一个字符列表,后跟一个?
. 。 That is, this part of the expression only matches a single or zero character, so it cannot match the .5
together. 也就是说,表达式的这一部分只匹配单个或零个字符,因此它不能与.5
匹配。
Here is an illustration of the matching attempts: 以下是匹配尝试的说明:
Now, if your pattern used ()
instead of []
, this would be the result: 现在,如果您的模式使用()
而不是[]
,则结果如下:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.