[英]Remove HTML from string — comments
I have the following text which still contains some HTML code: 我有以下文本仍然包含一些HTML代码:
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
Hi There,
For the product team to have any chance in analysing this issue we need clarification on how to reproduce the problem.
My code at the moment is: 我的代码目前是:
string replacedEmailText = Regex.Replace(emailText, @"<(.|\n)*?>", string.Empty);
string finalText = WebUtility.HtmlDecode(replacedEmailText);
How do I remove the top lines containing : 如何删除包含以下内容的顶行:
v\:* {behavior:url(#default#VML);}
? ?
For this specific example, you could use .*;}(\\r\\n|\\r|\\n)*
as your replacement pattern. 对于此特定示例,您可以使用.*;}(\\r\\n|\\r|\\n)*
作为替换模式。
However, this will fail, when the text contains the sequence ;}
. 但是,当文本包含序列时,这将失败;}
。 If this is possible, you might want to go further into detail on how the html lines look like: 如果可以,您可能希望进一步详细了解html行的外观:
.*\\(#default#VML\\);}(\\r\\n|\\r|\\n)*
Explanation: 说明:
.*
: matches any character except for new line and carriage return zero ore more consecutive times .*
:匹配任何字符,除了新行和回车零连续多次 \\(#default#VML\\);}
: matches the sequence (#default#VML) \\(#default#VML\\);}
:匹配序列(#default#VML) (\\r\\n|\\r|\\n)*
: removes new line and carriage return zero or more consecutive times (\\r\\n|\\r|\\n)*
:删除新行和回车连续零次或多次 不要尝试使用正则表达式从文本中删除HTML,使用一些白名单库,如https://github.com/mganss/HtmlSanitizer
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.