Java，正则表达式和匹配器

Question

I've got a friend who had this working at one point in time. 我有一个朋友在某个时间点上完成了这项工作。 In learning regular expressions, I don't understand why it would have a / as the sandbox testers balk at it. 在学习正则表达式时，我不明白为什么沙盒测试人员会反对它为什么会有一个/。

private static final Pattern SUB_URL_PATTERN = Pattern.compile("href=\"(/*\\w*/*\\w*/\\d+.html)\">",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

What is the / in the above regex pattern trying to do? 上面的正则表达式模式中的/试图做什么？ This pattern is broke and I'm not sure how to fix. 此模式已损坏，我不确定如何解决。

This is how it comes out in the debugger: 这是在调试器中显示出来的方式：

href="(/*\w*/*\w*/\d+.html)">

Is this how the regex would break down? 正则表达式会这样分解吗？

href="     ... matches href="
/*         ... matches 0 or more occurrences of /   
\w*        ... matches 0 or more occurrences of word characters   
/*         ... matches 0 or more occurrences of /   
\w*        ... matches 0 or more occurrences of word characters   
/          ... matches a /  
\d+        ... matches one or several digits   
.html)">   ... matches /html

Here is the snippet of webpage source that it should hitting on to capture href="/reo/4890530477.html": 这是网页源的片段，应该捕捉到href =“ / reo / 4890530477.html”：

<a href="/reo/4890530477.html" class="i" data-ids="0:00j0j_jDfSzBcGgid"></a>

Answer 1

final Pattern SUB_URL_PATTERN = Pattern.compile("href=\"/\\w+/\\w+/\\d+\\.html\"")

should match 应该匹配

href="/[word]/[word]/[number].html"

You might want: 你可能想要：

final Pattern SUB_URL_PATTERN = Pattern.compile("href=\"(/\\w+)*/\\d+\\.html\"")

Which will match 哪个会匹配

href="[0+ groups of '/word']/[number].html"

With Java, you need to use two backslashes \\\\ to make a string that contains the backslash... for example, if you wanted to have a regex pattern of \\d you would need a string declared as "\\\\d" because the Java language uses the same escape character that the regexes do. 使用Java，您需要使用两个反斜杠\\\\来创建包含反斜杠的字符串...例如，如果要使用\\d的正则表达式模式，则需要将字符串声明为"\\\\d"因为Java语言使用与正则表达式相同的转义字符。

I highly recommend you take maybe an hour to go through the following free regex tutorial: 我强烈建议您大概花一个小时来阅读以下免费的正则表达式教程：

http://regexone.com/ http://regexone.com/

It's interactive and a piece of cake to get through. 它是交互式的，可以轻松解决。 When you finish I guarantee you'll understand them 100x better. 完成后，我保证您会更好地理解它们。

To second Jens, it's probably a better idea to use an html parser than to use regexes for this. 对于Jens而言，使用html解析器可能比使用正则表达式更好。 You might check out jsoup; 您可以查看jsoup； it's what I use. 这就是我用的

http://jsoup.org/ http://jsoup.org/

Answer 2

The character / does not have any special meaning in the Java 字符/在Java中没有任何特殊含义
regular expressions syntax/language. 正则表达式的语法/语言。 It is just that: the / literal. 就是这样： /文字。

The metacharacters supported by the Java RegExp API are: <([{\\^-=$!|]})?*+.> Java RegExp API支持的元字符是： <([{\\^-=$!|]})?*+.>

See here: http://docs.oracle.com/javase/tutorial/essential/regex/literals.html 参见此处： http : //docs.oracle.com/javase/tutorial/essential/regex/literals.html

Java，正则表达式和匹配器

问题描述

2 个解决方案

解决方案1
1 2015-02-13 22:55:54

解决方案2
0 2015-02-13 22:34:15

Java，正则表达式和匹配器

问题描述

2 个解决方案

解决方案1 1 2015-02-13 22:55:54

解决方案2 0 2015-02-13 22:34:15

解决方案1
1 2015-02-13 22:55:54

解决方案2
0 2015-02-13 22:34:15