C# Regex Problem

Question

I want to extract all table rows from an HTML page. But using the pattern @"<tr>([\\w\\W]*)</tr>" is not working. It's giving one result which is first occurence of <tr> to last occurrence of </tr> . But I want every occurrence of <tr>...</tr> value. Can anyone please tell me how I can do this?

Answer 1

[\\w\\W]* matches greedily so it will match from the first <tr> to the last </tr> .

A regex approach won't work well because HTML is not a regular language. If you really wanted to try to use a lazy modifier such as "<tr>(.*?)</tr>" with the RegexOptions.Singleline flag, however this isn't guaranteed to work in all cases.

For parsing HTML you need an HTML parser. Try HTML Agility Pack .

Answer 2

I do agree with Mark: you should to use HTML Agility Pack library.

About your regex, you should to go with something like:

@"<tr>([\s\S]*?)</tr>"

That's a non greedy pattern, and you should to get one match for every TR.

C# Regex Problem

Question

2 answers

solution1
5 2011-02-04 22:55:52

solution2
2 ACCPTED 2011-02-04 23:00:10

C# Regex Problem

Question

2 answers

solution1 5 2011-02-04 22:55:52

solution2 2 ACCPTED 2011-02-04 23:00:10

solution1
5 2011-02-04 22:55:52

solution2
2 ACCPTED 2011-02-04 23:00:10