简体   繁体   English

为什么即使使用RegexOptions.Multiline,我的正则表达式在跨越多行的html标签上也不匹配?

[英]Why does my regex not match on html tags spanning multiple lines even when RegexOptions.Multiline is used?

I am attempting to strip out the tags out of the following text: 我正在尝试从以下文本中删除标签:

<P style=""MARGIN: 0in 0in 0pt"" class=MsoNormal><SPAN 
style=""COLOR: #1f497d""><FONT size=3 face=Calibri> </FONT></SPAN></P>

Notice how it's on two lines. 注意两行。 So when I try to use: 因此,当我尝试使用时:

Regex _html = new Regex("<.*?>", RegexOptions.Multiline);
tempHtml = _html.Replace(tempHtml, string.Empty);

It matches the <p> , <font> , </font> , </span> and </p> tags but does NOT seem to match the <span> tag. 它与<p><font></font></span></p>标记匹配,但似乎与<span>标记不匹配。

What am I doing wrong? 我究竟做错了什么?

RegexOptions.Multiline handles the meaning of \\Z and \\A . RegexOptions.Multiline处理\\Z\\A的含义。 RegexOptions.Singleline alters if . RegexOptions.Singleline会更改. means all character including linefeed (= true) or exclude linefeed (= false). 表示所有字符, 包括换行符(= true)或排除换行符(= false)。

So use RegexOptions.Singleline if you want to include linefeed in your tags. 因此,如果要在标签中包含换行符,请使用RegexOptions.Singleline

But , I agree to the comment of Bryan Crosby, who advised you to use the HtmlAgilityPack for parsing html instead of regex. 但是 ,我同意Bryan Crosby的评论,他建议您使用HtmlAgilityPack解析html而不是正则表达式。

I'm not sure how REGEX works with C#, but most of the time you have to escape the < and >. 我不确定REGEX如何与C#一起使用,但是大多数时候您必须转义<和>。 this pattern worked for me in php: 这种模式在php中对我有用:

\<.*?\>

I think what Fischermaen is trying to say is that you're using the wrong option, use single line mode: 我认为Fischermaen想要说的是您使用了错误的选项,使用单行模式:

Regex _html = new Regex("<.*?>", RegexOptions.Singleline);
tempHtml = _html.Replace(tempHtml, string.Empty);

Then go download Expresso and you can easily try this stuff out and test your expression. 然后下载Expresso ,您可以轻松尝试这些东西并测试您的表情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM