[英]java regex why these two regular expressions are different
I have a java string demonstrating a div element: 我有一个Java字符串演示div元素:
String source = "<div class = \"ads\">\n" +
"\t<dl style = \"font-size:14px; color:blue;\">\n" +
"\t\t<li>\n" +
"\t\t\t<a href = \"http://ggicci.blog.163.com\" target = \"_blank\">Ggicci's Blog</a>\n" +
"\t\t</li>\n" +
"\t</dl>\n" +
"</div>\n";
which in html form is: html形式的是:
<div class = "ads">
<dl style = "font-size:14px; color:blue;">
<li>
<a href = "http://ggicci.blog.163.com" target = "_blank">Ggicci's Blog</a>
</li>
</dl>
</div>
And I write such a regex to extract dl element: 我编写了这样一个正则表达式来提取dl元素:
<dl[.\\s]*?>[.\\s]*?</div>
But it finds nothing and I modified it to be: 但它什么也没找到,我将其修改为:
<dl(.|\\s)*?>(.|\\s)*?</div>
then it works. 然后就可以了。 So I tested like this:
所以我像这样测试:
System.out.println(Pattern.matches("[.\\s]", "a")); --> false
System.out.println(Pattern.matches("[abc\\s]", "a")); --> true
so why the '.' 那为什么是“。” cant match 'a' ?
无法匹配“ a”?
Inside the square brackets, the characters are treated literaly. 在方括号内,字符按字面意义对待。
[.\\\\s]
means "Match a dot, or a backslash or as". [.\\\\s]
意思是“匹配点,反斜杠或as”。
(.|\\\\s)
is equivalent to .
(.|\\\\s)
等同于.
. 。
I think you really want the following regex: 我认为您确实需要以下正则表达式:
<dl[^>]*>.*?</div>
+1 for above. +1以上。
I would do: 我会做:
<dl[^>]*>(.*?)</dl>
To match the content of dl
匹配
dl
的内容
the syntax [.\\\\s]
makes no sense, because, and Daniel said, the .
语法
[.\\\\s]
没有任何意义,因为,但丹尼尔说, .
just means "a dot" in this context. 在此上下文中仅表示“点”。
Why can't you replace your [.\\\\s]
with a much simpler .
为什么不能用更简单的替换
[.\\\\s]
.
? ?
When you include regexes in a post, it's a good idea to post them as you're actually using them--in this case, as Java string literals. 当您在帖子中包含正则表达式时,最好在实际使用它们时发布它们-在这种情况下,应作为Java字符串文字。
"[.\\\\s]"
is a Java string literal representing the regex [.\\s]
; "[.\\\\s]"
是表示正则表达式[.\\s]
的Java字符串文字; it matches a literal dot or a whitespace character. 它与文字点或空格字符匹配。 Your regex is not trying to match a backslash or an 's', as others have said, but the crucial factor is that
.
正如其他人所说,您的正则表达式并不试图匹配反斜杠或's',但关键因素是
.
loses its special meaning inside a character class. 在角色类中失去其特殊含义。
"(.|\\\\s)"
is a Java string literal representing the regex (.|\\s)
; "(.|\\\\s)"
是表示正则表达式(.|\\s)
的Java字符串文字; it matches ( anything but a line separator character OR any whitespace character ). 匹配( 除行分隔符或任何空白字符外 )。 It works as you intended, but don't use it!
它可以按您的预期工作,但是请不要使用它! It leaves you extremely vulnerable to catastrophic backtracking , as explained in this answer .
如答案中所述 ,它使您极易遭受灾难性的回溯 。
But no worries, all you really need to do is use DOTALL mode (also known as single-line mode), which enables .
但不用担心,您真正需要做的就是使用DOTALL模式(也称为单行模式),该模式启用
.
to match anything including line separator characters. 匹配任何内容, 包括行分隔符。
(?s)<dl\b[^>]*>.*?</dl>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.