[英]Using regex to grab text between multiple lines
I am trying to grab the word Juwelier that is before the tag from this HTML page code. 我正在尝试从此HTML页面代码中捕获标记之前的Juwelier一词。
I am not very good with RegEx, and especially not with using it on multiple lines. 我对RegEx不太满意,尤其是不能在多行上使用它。 Thing that will NOT be dynamic :
不会动态的事情:
<p>Rubriek:
class="category"
<p> , </p> , <a> , </a>
<p> , </p> , <a> , </a>
This is the HTML page code 这是HTML页面代码
<p>Rubriek:
<a href="http://www.detelefoongids.nl/juwelier/4-1/?oWhat=Juwelier"
title="Juwelier"
class="category">
Juwelier
</a>
</p>
The Regex below is one among many that you could use. 下面的正则表达式是您可以使用的众多正则表达式之一。
It uses zero-width positive look-behind (?<=)
and look-ahead (?=)
assertions to locate the target string. 它使用零宽度正向后看
(?<=)
和超前看(?=)
断言来定位目标字符串。
Dim str As String = _
"<p>Rubriek:" & vbCrLf &
" <a href=""http://www.detelefoongids.nl/juwelier/4-1/?oWhat=Juwelier""" & vbCrLf &
" title = ""Juwelier""" & vbCrLf &
" class=""category"">" & vbCrLf &
" Juwelier" & vbCrLf &
" </a>" & vbCrLf &
"</p>"
Dim match As Match = Regex.Match(str, _
"(?<=<p>Rubriek:[^>]+?class=""category"">\W*)\w+(?=\W*</a>)")
If (match.Success) Then
MsgBox(match.Value)
End If
Although not used above, an important thing to remember when trying to match over multiple lines is to use Single-line mode if you are going to use the wild-card metacharacter .
尽管上面没有使用,但是如果要使用通配符元字符,则在尝试匹配多行时要记住的重要一点是使用单行模式
.
, so that it matches every character including new-lines . ,以便与每个字符( 包括换行符)匹配。 This can be specified using
RegexOptions.Singleline
or by putting (?s)
at the start of the Regex. 可以使用
RegexOptions.Singleline
或在正则表达式的开头放置(?s)
来指定。
\\w+
is used to match one or more word characters, ie a-zA-Z0-9_
\\w+
用于匹配一个或多个单词字符,即a-zA-Z0-9_
\\W*
is used to match zero or more non-word characters. \\W*
用于匹配零个或多个非单词字符。
[^>]
is used to match characters that are not >
. [^>]
用于匹配不是>
字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.