I am attempting to strip out the tags out of the following text:
<P style=""MARGIN: 0in 0in 0pt"" class=MsoNormal><SPAN
style=""COLOR: #1f497d""><FONT size=3 face=Calibri> </FONT></SPAN></P>
Notice how it's on two lines. So when I try to use:
Regex _html = new Regex("<.*?>", RegexOptions.Multiline);
tempHtml = _html.Replace(tempHtml, string.Empty);
It matches the <p>
, <font>
, </font>
, </span>
and </p>
tags but does NOT seem to match the <span>
tag.
What am I doing wrong?
RegexOptions.Multiline
handles the meaning of \\Z
and \\A
. RegexOptions.Singleline
alters if .
means all character including linefeed (= true) or exclude linefeed (= false).
So use RegexOptions.Singleline
if you want to include linefeed in your tags.
But , I agree to the comment of Bryan Crosby, who advised you to use the HtmlAgilityPack for parsing html instead of regex.
I'm not sure how REGEX works with C#, but most of the time you have to escape the < and >. this pattern worked for me in php:
\<.*?\>
I think what Fischermaen is trying to say is that you're using the wrong option, use single line mode:
Regex _html = new Regex("<.*?>", RegexOptions.Singleline);
tempHtml = _html.Replace(tempHtml, string.Empty);
Then go download Expresso and you can easily try this stuff out and test your expression.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.