简体   繁体   English

正则表达式与多行模式中的空字符串不匹配(Java)

[英]Regular expression doesn't match empty string in multiline mode (Java)

I just observed this behavior; 我刚观察到这种行为;

Pattern p1 = Pattern.compile("^$");
Matcher m1 = p1.matcher("");
System.out.println(m1.matches()); /* true */

Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
Matcher m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

It strikes me as odd that the last statement is false. 令我感到奇怪的是,最后一句话是错误的。 This is what the docs say; 这就是文档所说的;

By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. 默认情况下,正则表达式^和$忽略行终止符,并且仅分别匹配整个输入序列的开头和结尾。 If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. 如果激活MULTILINE模式,则^在输入开始时和任何行终止符之后匹配,但输入结束时除外。 When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence. 当处于MULTILINE模式时,$匹配行终止符或输入序列的结尾。 http://docs.oracle.com/javase/1.4.2... HTTP://docs.oracle.com/javase/1.4.2 ...

From what I get from this, it should match? 从我得到的,它应该匹配? The following makes things even more confusing; 以下使事情变得更加混乱;

Pattern p3 = Pattern.compile("^test$");
Matcher m3 = p3.matcher("test");
System.out.println(m3.matches()); /* true */

Pattern p4 = Pattern.compile("^test$", Pattern.MULTILINE);
Matcher m4 = p4.matcher("test");
System.out.println(m4.matches()); /* true */

So what is this? 这是什么? How do I make sense of this? 我怎么理解这个? I hope someone can shed some light on this, would be really appreciated. 我希望有人可以对此有所了解,真的很感激。

If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. 如果激活MULTILINE模式,则^在输入开始时和任何行终止符之后匹配,但输入结束时除外。

Since you are at the end of input, ^ can't match in multiline mode. 由于您在输入结束时, ^在多行模式下无法匹配。

This is surprising, even disgusting, but nevertheless according to its documentation. 这令人惊讶,甚至令人作呕,但仍然根据其文件。

Let's look a bit closer at your second example: 让我们看看你的第二个例子:

Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
Matcher m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

So you have a line in m2, that is empty OR contains only character of endline and no other characters. 所以你有一个m2的行,它是空的或只包含结束字符而没有其他字符。 Therefore you pattern, in order to correspond to the given line, should be only "$" ie: 因此,为了对应于给定的行,您的模式应该只是“$”,即:

// Your example
Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
Matcher m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

// Let's check if it is start of the line
p2 = Pattern.compile("^", Pattern.MULTILINE);
m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

// Let's check if it is end of the line
p2 = Pattern.compile("$", Pattern.MULTILINE);
m2 = p2.matcher("");
System.out.println(m2.matches()); /* true */

Sounds like a bug. 听起来像个臭虫。 At most, in multi-line mode, "^" and "$" could be interpreted as matching at an internal line boundary. 最多,在多行模式中,“^”和“$”可以被解释为在内部行边界处匹配。 Java might not have extended variable state structure say, like Perl does. Java可能没有像Perl那样的扩展变量状态结构。 I don't know if this is even a cause. 我不知道这是不是一个原因。

The fact that /^test$/m matches just prove ^$ work in multi-line mode except when the string is empty (in Java), but clearly multi-line mode test for empty string is ludicrous since /^$/ work for that. /^test$/m匹配的事实只是证明^ $在多行模式下工作,除非字符串为空(在Java中),但显然空字符串的多行模式测试是荒谬的,因为/^$/ work for那。

Testing in Perl, everything works as expected: 在Perl中进行测试,一切都按预期工作:

if ( "" =~ /^$/m   ) { print "/^\$/m    matches\n"; }
if ( "" =~ /^$/    ) { print "/^\$/     matches\n"; }
if ( "" =~ /\A\Z/m ) { print "/\\A\\Z/m  matches\n"; }
if ( "" =~ /\A\Z/  ) { print "/\\A\\Z/   matches\n"; }
if ( "" =~ /\A\z/  ) { print "/\\A\\z/   matches\n"; }
if ( "" =~ /^/m    ) { print "/^/m     matches\n"; }
if ( "" =~ /$/m    ) { print "/\$/m     matches\n"; }


__END__


/^$/m    matches
/^$/     matches
/\A\Z/m  matches
/\A\Z/   matches
/\A\z/   matches
/^/m     matches
/$/m     matches

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM