简体   繁体   English

Java正则表达式转义序列

[英]Java Regular Expression Escape Sequence

I was trying to match the example in , <p><a href="example/index.html">LinkToPage</a></p> 我试图匹配<p><a href="example/index.html">LinkToPage</a></p>中的example

With rubular.com I could get something like <a href=\\"(.*)?\\/index.html\\">.*<\\/a> . 使用rubular.com,我可以获得类似<a href=\\"(.*)?\\/index.html\\">.*<\\/a>

I'll be using this in Pattern.compile in Java . 我将在Java Pattern.compile中使用它。 I know that \\ has to be escaped as well, and I've come up with <a href=\\\\\\"(.*)?\\\\\\/index.html\\\\\\">.*<\\\\\\/a> and a few more variations but I'm getting it wrong. 我知道\\也必须转义,并且我想出了<a href=\\\\\\"(.*)?\\\\\\/index.html\\\\\\">.*<\\\\\\/a>和其他一些变体,但我弄错了。 I tested on regexplanet. 我在regexplanet上进行了测试。 Can anyone help me with this? 谁能帮我这个?

Use "<a href=\\"(.*)/index.html\\">.*</a>" in your Java code. 在Java代码中使用"<a href=\\"(.*)/index.html\\">.*</a>"

You only need to escape " because it's a Java string literal. 您只需要转义"因为它是Java字符串文字。

You don't need to escape / , because you aren't delimiting your regex with slashes (as you would be in Ruby). 您不需要转义/ ,因为您不需要用斜杠来分隔正则表达式(就像在Ruby中一样)。

Also, (.*)? 还有(.*)? makes no sense. 没有意义。 Just use (.*) . 只需使用(.*) * can already match "nothing", so there's no point in having the ? *已经可以匹配“ nothing”,因此使用?没有意义? .

Pattern.compile("<a href=\"(.*)?/index.html\">.*</a>");

That should fix your regex. 那应该修复您的正则表达式。 You do not need to escape the forward slashes. 您无需转义正斜杠。

However I am obligated to present you with the standard caution against parsing HTML with regex: 但是,我有义务向您提供使用正则表达式解析HTML的标准警告:

RegEx match open tags except XHTML self-contained tags RegEx匹配XHTML自包含标签以外的打开标签

您可以告诉Java匹配什么,然后调用Pattern.quote(str)使其逃避正确的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM