[英]java, regular expressions, & matcher
I've got a friend who had this working at one point in time. 我有一个朋友在某个时间点上完成了这项工作。 In learning regular expressions, I don't understand why it would have a / as the sandbox testers balk at it.
在学习正则表达式时,我不明白为什么沙盒测试人员会反对它为什么会有一个/。
private static final Pattern SUB_URL_PATTERN = Pattern.compile("href=\"(/*\\w*/*\\w*/\\d+.html)\">",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
What is the / in the above regex pattern trying to do? 上面的正则表达式模式中的/试图做什么? This pattern is broke and I'm not sure how to fix.
此模式已损坏,我不确定如何解决。
This is how it comes out in the debugger: 这是在调试器中显示出来的方式:
href="(/*\w*/*\w*/\d+.html)">
Is this how the regex would break down? 正则表达式会这样分解吗?
href=" ... matches href="
/* ... matches 0 or more occurrences of /
\w* ... matches 0 or more occurrences of word characters
/* ... matches 0 or more occurrences of /
\w* ... matches 0 or more occurrences of word characters
/ ... matches a /
\d+ ... matches one or several digits
.html)"> ... matches /html
Here is the snippet of webpage source that it should hitting on to capture href="/reo/4890530477.html": 这是网页源的片段,应该捕捉到href =“ / reo / 4890530477.html”:
<a href="/reo/4890530477.html" class="i" data-ids="0:00j0j_jDfSzBcGgid"></a>
final Pattern SUB_URL_PATTERN = Pattern.compile("href=\"/\\w+/\\w+/\\d+\\.html\"")
should match 应该匹配
href="/[word]/[word]/[number].html"
You might want: 你可能想要:
final Pattern SUB_URL_PATTERN = Pattern.compile("href=\"(/\\w+)*/\\d+\\.html\"")
Which will match 哪个会匹配
href="[0+ groups of '/word']/[number].html"
With Java, you need to use two backslashes \\\\
to make a string that contains the backslash... for example, if you wanted to have a regex pattern of \\d
you would need a string declared as "\\\\d"
because the Java language uses the same escape character that the regexes do. 使用Java,您需要使用两个反斜杠
\\\\
来创建包含反斜杠的字符串...例如,如果要使用\\d
的正则表达式模式,则需要将字符串声明为"\\\\d"
因为Java语言使用与正则表达式相同的转义字符。
I highly recommend you take maybe an hour to go through the following free regex tutorial: 我强烈建议您大概花一个小时来阅读以下免费的正则表达式教程:
http://regexone.com/ http://regexone.com/
It's interactive and a piece of cake to get through. 它是交互式的,可以轻松解决。 When you finish I guarantee you'll understand them 100x better.
完成后,我保证您会更好地理解它们。
To second Jens, it's probably a better idea to use an html parser than to use regexes for this. 对于Jens而言,使用html解析器可能比使用正则表达式更好。 You might check out jsoup;
您可以查看jsoup; it's what I use.
这就是我用的
The character /
does not have any special meaning in the Java 字符
/
在Java中没有任何特殊含义
regular expressions syntax/language. 正则表达式的语法/语言。 It is just that: the
/
literal. 就是这样:
/
文字。
The metacharacters supported by the Java RegExp API are:
<([{\\^-=$!|]})?*+.>
Java RegExp API支持的元字符是:
<([{\\^-=$!|]})?*+.>
See here: http://docs.oracle.com/javase/tutorial/essential/regex/literals.html 参见此处: http : //docs.oracle.com/javase/tutorial/essential/regex/literals.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.