I'm trying to parse text out of two lines of HTML.
Dim PattStats As New Regex("class=""head"">(.+?)</td>"+
"\n<td>(.+?)</td>")
Dim makor As MatchCollection = PattStats.Matches(page)
For Each MatchMak As Match In makor
ListView3.Items.Add(MatchMak.Groups(1).Value)
Next
I added the \\n
to match the next line, but for some reason it won't work. Here's the source I'm running the regex against.
<table class="table table-striped table-bordered table-condensed">
<tbody>
<tr>
<td class="head">Health Points:</td>
<td>445 (+85 / per level)</td>
<td class="head">Health Regen:</td>
<td>7.25</td>
</tr>
<tr>
<td class="head">Energy:</td>
<td>200</td>
<td class="head">Energy Regen:</td>
<td>50</td>
</tr>
<tr>
<td class="head">Damage:</td>
<td>53 (+3.2 / per level)</td>
<td class="head">Attack Speed:</td>
<td>0.694 (+3.1 / per level)</td>
</tr>
<tr>
<td class="head">Attack Range:</td>
<td>125</td>
<td class="head">Movement Speed:</td>
<td>325</td>
</tr>
<tr>
<td class="head">Armor:</td>
<td>16.5 (+3.5 / per level)</td>
<td class="head">Magic Resistance:</td>
<td>30 (+1.25 / per level)</td>
</tr>
<tr>
<td class="head">Influence Points (IP):</td>
<td>3150</td>
<td class="head">Riot Points (RP):</td>
<td>975</td>
</tr>
</tbody>
</table>
I'd like to match the first <td class...>
and the following line in one regex :/
This regex will find td
tags and return them in groups of two.
<td\\b[^>]*>([^<]*)<\\/td>[^<]*<td\\b[^>]*>([^<]*)<\\/td>
<td\\b[^>]*>
find the first td tag and consume any attributes ([^<]*)
capture the first inner text, this can be greedy but we assume the cell has no nested tags <\\/td>
find the close tag [^<]*
move past all the rest of the text until you, this assumes there are no additional tags between the first and second td tag <td\\b[^>]*>
find the second td tage and consume any attributes ([^<]*)
capture the second inner text, this can be greedy but we assume the cell has no nested tags <\\/td>
find the close tag Group 0 will get the entire string
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim re As Regex = New Regex("<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
Dim mc as MatchCollection = re.Matches(sourcestring)
Dim mIdx as Integer = 0
For each m as Match in mc
For groupIdx As Integer = 0 To m.Groups.Count - 1
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
Next
mIdx=mIdx+1
Next
End Sub
End Module
$matches Array:
(
[0] => Array
(
[0] => <td class="head">Health Points:</td>
<td>445 (+85 / per level)</td>
[1] => <td class="head">Health Regen:</td>
<td>7.25</td>
[2] => <td class="head">Energy:</td>
<td>200</td>
[3] => <td class="head">Energy Regen:</td>
<td>50</td>
[4] => <td class="head">Damage:</td>
<td>53 (+3.2 / per level)</td>
[5] => <td class="head">Attack Speed:</td>
<td>0.694 (+3.1 / per level)</td>
[6] => <td class="head">Attack Range:</td>
<td>125</td>
[7] => <td class="head">Movement Speed:</td>
<td>325</td>
[8] => <td class="head">Armor:</td>
<td>16.5 (+3.5 / per level)</td>
[9] => <td class="head">Magic Resistance:</td>
<td>30 (+1.25 / per level)</td>
[10] => <td class="head">Influence Points (IP):</td>
<td>3150</td>
[11] => <td class="head">Riot Points (RP):</td>
<td>975</td>
)
[1] => Array
(
[0] => Health Points:
[1] => Health Regen:
[2] => Energy:
[3] => Energy Regen:
[4] => Damage:
[5] => Attack Speed:
[6] => Attack Range:
[7] => Movement Speed:
[8] => Armor:
[9] => Magic Resistance:
[10] => Influence Points (IP):
[11] => Riot Points (RP):
)
[2] => Array
(
[0] => 445 (+85 / per level)
[1] => 7.25
[2] => 200
[3] => 50
[4] => 53 (+3.2 / per level)
[5] => 0.694 (+3.1 / per level)
[6] => 125
[7] => 325
[8] => 16.5 (+3.5 / per level)
[9] => 30 (+1.25 / per level)
[10] => 3150
[11] => 975
)
)
Parsing html with a regex is really not the best solution as there a ton of edge cases what we can't predict. However in this case if input string is always this basic, and you're willing to accept the risk of the regex not working 100% of the time, then this solution would probably work for you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.