I want to extract all table rows from an HTML page. But using the pattern @"<tr>([\\w\\W]*)</tr>"
is not working. It's giving one result which is first occurence of <tr>
to last occurrence of </tr>
. But I want every occurrence of <tr>...</tr>
value. Can anyone please tell me how I can do this?
[\\w\\W]*
matches greedily so it will match from the first <tr>
to the last </tr>
.
A regex approach won't work well because HTML is not a regular language. If you really wanted to try to use a lazy modifier such as "<tr>(.*?)</tr>"
with the RegexOptions.Singleline
flag, however this isn't guaranteed to work in all cases.
For parsing HTML you need an HTML parser. Try HTML Agility Pack .
I do agree with Mark: you should to use HTML Agility Pack library.
About your regex, you should to go with something like:
@"<tr>([\s\S]*?)</tr>"
That's a non greedy pattern, and you should to get one match for every TR.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.