[英]Why does my regex not match on html tags spanning multiple lines even when RegexOptions.Multiline is used?
I am attempting to strip out the tags out of the following text: 我正在尝试从以下文本中删除标签:
<P style=""MARGIN: 0in 0in 0pt"" class=MsoNormal><SPAN
style=""COLOR: #1f497d""><FONT size=3 face=Calibri> </FONT></SPAN></P>
Notice how it's on two lines. 注意两行。 So when I try to use:
因此,当我尝试使用时:
Regex _html = new Regex("<.*?>", RegexOptions.Multiline);
tempHtml = _html.Replace(tempHtml, string.Empty);
It matches the <p>
, <font>
, </font>
, </span>
and </p>
tags but does NOT seem to match the <span>
tag. 它与
<p>
, <font>
, </font>
, </span>
和</p>
标记匹配,但似乎与<span>
标记不匹配。
What am I doing wrong? 我究竟做错了什么?
RegexOptions.Multiline
handles the meaning of \\Z
and \\A
. RegexOptions.Multiline
处理\\Z
和\\A
的含义。 RegexOptions.Singleline
alters if .
RegexOptions.Singleline
会更改.
means all character including linefeed (= true) or exclude linefeed (= false). 表示所有字符, 包括换行符(= true)或排除换行符(= false)。
So use RegexOptions.Singleline
if you want to include linefeed in your tags. 因此,如果要在标签中包含换行符,请使用
RegexOptions.Singleline
。
But , I agree to the comment of Bryan Crosby, who advised you to use the HtmlAgilityPack for parsing html instead of regex. 但是 ,我同意Bryan Crosby的评论,他建议您使用HtmlAgilityPack解析html而不是正则表达式。
I'm not sure how REGEX works with C#, but most of the time you have to escape the < and >. 我不确定REGEX如何与C#一起使用,但是大多数时候您必须转义<和>。 this pattern worked for me in php:
这种模式在php中对我有用:
\<.*?\>
I think what Fischermaen is trying to say is that you're using the wrong option, use single line mode: 我认为Fischermaen想要说的是您使用了错误的选项,使用单行模式:
Regex _html = new Regex("<.*?>", RegexOptions.Singleline);
tempHtml = _html.Replace(tempHtml, string.Empty);
Then go download Expresso and you can easily try this stuff out and test your expression. 然后下载Expresso ,您可以轻松尝试这些东西并测试您的表情。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.