简体   繁体   中英

Why does my regex not match on html tags spanning multiple lines even when RegexOptions.Multiline is used?

I am attempting to strip out the tags out of the following text:

<P style=""MARGIN: 0in 0in 0pt"" class=MsoNormal><SPAN 
style=""COLOR: #1f497d""><FONT size=3 face=Calibri> </FONT></SPAN></P>

Notice how it's on two lines. So when I try to use:

Regex _html = new Regex("<.*?>", RegexOptions.Multiline);
tempHtml = _html.Replace(tempHtml, string.Empty);

It matches the <p> , <font> , </font> , </span> and </p> tags but does NOT seem to match the <span> tag.

What am I doing wrong?

RegexOptions.Multiline handles the meaning of \\Z and \\A . RegexOptions.Singleline alters if . means all character including linefeed (= true) or exclude linefeed (= false).

So use RegexOptions.Singleline if you want to include linefeed in your tags.

But , I agree to the comment of Bryan Crosby, who advised you to use the HtmlAgilityPack for parsing html instead of regex.

I'm not sure how REGEX works with C#, but most of the time you have to escape the < and >. this pattern worked for me in php:

\<.*?\>

I think what Fischermaen is trying to say is that you're using the wrong option, use single line mode:

Regex _html = new Regex("<.*?>", RegexOptions.Singleline);
tempHtml = _html.Replace(tempHtml, string.Empty);

Then go download Expresso and you can easily try this stuff out and test your expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM