简体   繁体   English

为什么没有边界匹配器“行首”的正则表达式不匹配?

[英]Why does regular expression not match without boundary matcher "Beginning of line"?

There is something I don't understand in Java's regular expressions. Java 的正则表达式中有一些我不明白的地方。 I have the following string (and I need the "to Date"):我有以下字符串(我需要“截止日期”):

From Date :01/11/2011 To Date :30/11/2011;;;;;;;;;;;;;

I think that the following regular expression (in Perl) would have matched.我认为以下正则表达式(在 Perl 中)会匹配。

to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4})

In Java, this pattern doesn't match.在 Java 中,此模式不匹配。 But it does if I add in front and at the end a .+ So this pattern works in Java:但是,如果我在前面和末尾添加一个.+ ,它就会起作用 所以这个模式适用于 Java:

Pattern p = Pattern.compile(".+to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4}).+", Pattern.CASE_INSENSITIVE);

What I don't understand: It would be clear to me that the first pattern would not match in Java if I add a ^ (beginning of the line) and a $ at the end of the line.我不明白的是:如果我在行尾添加^ (行首)和$ ,我会很清楚第一个模式在 Java 中不匹配。 That would mean, that the pattern has to match the whole line.这意味着,模式必须匹配整行。 But without that, the first pattern should actually match, because why does the pattern care about string data which is out of scope of this pattern, if I don't set delimiters in front and at the end?但是如果没有它,第一个模式实际上应该匹配,因为如果我不在前面和末尾设置分隔符,为什么模式关心这个模式的 scope 之外的字符串数据? This is not logical to me.这对我来说不合逻辑。 In my opinion the first pattern should behave similar to the "contains" method of String class. And I think it is so in Perl.在我看来,第一个模式的行为应该类似于字符串 class 的“包含”方法。我认为在 Perl 中也是如此。

In Java, matches() validates the entire string.在 Java 中, matches()验证整个字符串。 Your input probably has line breaks in them (which don't get matched by .+ ).您的输入可能有换行符(与.+不匹配)。

Try this instead:试试这个:

Pattern p = Pattern.compile(".+to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4}).+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("... \n From Date :01/11/2011 To Date :30/11/2011;;;;;;;;;;;;; \n ...");

System.out.println(m.matches()); // prints false

if(m.find()) {
  System.out.println(m.group(1)); // prints 30/11/2011
}

And when using find() , your can drop the .+ 's from the pattern:使用find()时,您可以从模式中删除.+

Pattern.compile("to\\s+date\\s*?:\\s*?([0-9]{2}[./][0-9]{2}[./][0-9]{2,4})", Pattern.CASE_INSENSITIVE);

(no need to escape the . inside a character class, btw) (无需转义字符 class 中的. ,顺便说一句)

I think this answer from a different question also answers your question: Why do regular expressions in Java and Perl act differently?我认为这个来自不同问题的答案也回答了你的问题:为什么 Java 和 Perl 中的正则表达式表现不同?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM