[英]Regex \n doesn't work
我正在嘗試從兩行HTML中解析文本。
Dim PattStats As New Regex("class=""head"">(.+?)</td>"+
"\n<td>(.+?)</td>")
Dim makor As MatchCollection = PattStats.Matches(page)
For Each MatchMak As Match In makor
ListView3.Items.Add(MatchMak.Groups(1).Value)
Next
我添加了\\n
來匹配下一行,但是由於某種原因,它不起作用。 這是我運行正則表達式的源。
<table class="table table-striped table-bordered table-condensed">
<tbody>
<tr>
<td class="head">Health Points:</td>
<td>445 (+85 / per level)</td>
<td class="head">Health Regen:</td>
<td>7.25</td>
</tr>
<tr>
<td class="head">Energy:</td>
<td>200</td>
<td class="head">Energy Regen:</td>
<td>50</td>
</tr>
<tr>
<td class="head">Damage:</td>
<td>53 (+3.2 / per level)</td>
<td class="head">Attack Speed:</td>
<td>0.694 (+3.1 / per level)</td>
</tr>
<tr>
<td class="head">Attack Range:</td>
<td>125</td>
<td class="head">Movement Speed:</td>
<td>325</td>
</tr>
<tr>
<td class="head">Armor:</td>
<td>16.5 (+3.5 / per level)</td>
<td class="head">Magic Resistance:</td>
<td>30 (+1.25 / per level)</td>
</tr>
<tr>
<td class="head">Influence Points (IP):</td>
<td>3150</td>
<td class="head">Riot Points (RP):</td>
<td>975</td>
</tr>
</tbody>
</table>
我想在一個正則表達式中匹配第一個<td class...>
和下一行:/
此正則表達式將查找td
標簽,並以兩個為一組的方式返回它們。
<td\\b[^>]*>([^<]*)<\\/td>[^<]*<td\\b[^>]*>([^<]*)<\\/td>
<td\\b[^>]*>
找到第一個td標簽並使用任何屬性 ([^<]*)
捕獲第一個內部文本,這可能很貪心,但是我們假設該單元格沒有嵌套標簽 <\\/td>
查找關閉標簽 [^<]*
移動到文本的所有其余部分,直到您為止,這假設第一個和第二個td標簽之間沒有其他標簽 <td\\b[^>]*>
找到第二個td tage並使用任何屬性 ([^<]*)
捕獲第二個內部文本,這可能很貪心,但是我們假設該單元格沒有嵌套標簽 <\\/td>
查找關閉標簽 組0將獲取整個字符串
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim re As Regex = New Regex("<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
Dim mc as MatchCollection = re.Matches(sourcestring)
Dim mIdx as Integer = 0
For each m as Match in mc
For groupIdx As Integer = 0 To m.Groups.Count - 1
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
Next
mIdx=mIdx+1
Next
End Sub
End Module
$matches Array:
(
[0] => Array
(
[0] => <td class="head">Health Points:</td>
<td>445 (+85 / per level)</td>
[1] => <td class="head">Health Regen:</td>
<td>7.25</td>
[2] => <td class="head">Energy:</td>
<td>200</td>
[3] => <td class="head">Energy Regen:</td>
<td>50</td>
[4] => <td class="head">Damage:</td>
<td>53 (+3.2 / per level)</td>
[5] => <td class="head">Attack Speed:</td>
<td>0.694 (+3.1 / per level)</td>
[6] => <td class="head">Attack Range:</td>
<td>125</td>
[7] => <td class="head">Movement Speed:</td>
<td>325</td>
[8] => <td class="head">Armor:</td>
<td>16.5 (+3.5 / per level)</td>
[9] => <td class="head">Magic Resistance:</td>
<td>30 (+1.25 / per level)</td>
[10] => <td class="head">Influence Points (IP):</td>
<td>3150</td>
[11] => <td class="head">Riot Points (RP):</td>
<td>975</td>
)
[1] => Array
(
[0] => Health Points:
[1] => Health Regen:
[2] => Energy:
[3] => Energy Regen:
[4] => Damage:
[5] => Attack Speed:
[6] => Attack Range:
[7] => Movement Speed:
[8] => Armor:
[9] => Magic Resistance:
[10] => Influence Points (IP):
[11] => Riot Points (RP):
)
[2] => Array
(
[0] => 445 (+85 / per level)
[1] => 7.25
[2] => 200
[3] => 50
[4] => 53 (+3.2 / per level)
[5] => 0.694 (+3.1 / per level)
[6] => 125
[7] => 325
[8] => 16.5 (+3.5 / per level)
[9] => 30 (+1.25 / per level)
[10] => 3150
[11] => 975
)
)
用正則表達式解析html確實不是最好的解決方案,因為存在大量無法預料的邊緣情況。 但是,在這種情況下,如果輸入字符串始終是基本字符串,並且您願意接受正則表達式無法100%正常工作的風險,則此解決方案可能對您有用。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.