Why does my regex not match on html tags spanning multiple lines even when RegexOptions.Multiline is used?

Question

I am attempting to strip out the tags out of the following text:

<P style=""MARGIN: 0in 0in 0pt"" class=MsoNormal><SPAN 
style=""COLOR: #1f497d""><FONT size=3 face=Calibri> </FONT></SPAN></P>

Notice how it's on two lines. So when I try to use:

Regex _html = new Regex("<.*?>", RegexOptions.Multiline);
tempHtml = _html.Replace(tempHtml, string.Empty);

It matches the  ,  ,  ,  and  tags but does NOT seem to match the  tag.

What am I doing wrong?

Answer 1

RegexOptions.Multiline handles the meaning of \\Z and \\A . RegexOptions.Singleline alters if . means all character including linefeed (= true) or exclude linefeed (= false).

So use RegexOptions.Singleline if you want to include linefeed in your tags.

But , I agree to the comment of Bryan Crosby, who advised you to use the HtmlAgilityPack for parsing html instead of regex.

Answer 2

I'm not sure how REGEX works with C#, but most of the time you have to escape the < and >. this pattern worked for me in php:

\<.*?\>

Answer 3

I think what Fischermaen is trying to say is that you're using the wrong option, use single line mode:

Regex _html = new Regex("<.*?>", RegexOptions.Singleline);
tempHtml = _html.Replace(tempHtml, string.Empty);

Then go download Expresso and you can easily try this stuff out and test your expression.

Why does my regex not match on html tags spanning multiple lines even when RegexOptions.Multiline is used?

Question

3 answers

solution1
3 2011-12-13 19:51:56

solution2
1 ACCPTED 2011-12-13 19:52:49

solution3
1 2011-12-13 19:57:15

Why does my regex not match on html tags spanning multiple lines even when RegexOptions.Multiline is used?

Question

3 answers

solution1 3 2011-12-13 19:51:56

solution2 1 ACCPTED 2011-12-13 19:52:49

solution3 1 2011-12-13 19:57:15

solution1
3 2011-12-13 19:51:56

solution2
1 ACCPTED 2011-12-13 19:52:49

solution3
1 2011-12-13 19:57:15