I have the following text which still contains some HTML code:
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
Hi There,
For the product team to have any chance in analysing this issue we need clarification on how to reproduce the problem.
My code at the moment is:
string replacedEmailText = Regex.Replace(emailText, @"<(.|\n)*?>", string.Empty);
string finalText = WebUtility.HtmlDecode(replacedEmailText);
How do I remove the top lines containing :
v\:* {behavior:url(#default#VML);}
?
For this specific example, you could use .*;}(\\r\\n|\\r|\\n)*
as your replacement pattern.
However, this will fail, when the text contains the sequence ;}
. If this is possible, you might want to go further into detail on how the html lines look like:
.*\\(#default#VML\\);}(\\r\\n|\\r|\\n)*
Explanation:
.*
: matches any character except for new line and carriage return zero ore more consecutive times \\(#default#VML\\);}
: matches the sequence (#default#VML) (\\r\\n|\\r|\\n)*
: removes new line and carriage return zero or more consecutive times Demo here
不要尝试使用正则表达式从文本中删除HTML,使用一些白名单库,如https://github.com/mganss/HtmlSanitizer
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.