简体   繁体   English

正则表达式删除HTML,但有例外

[英]Regex expression to remove HTML but with exceptions

In C#, I have the following regex expression to remove HTML from a string: 在C#中,我有以下正则表达式可从字符串中删除HTML:

var regex = new Regex("<[^>]*(>|$)");
return regex.Replace(input, match => "");

There are some cases where we need to allow for double >> and << . 在某些情况下,我们需要允许使用double >><< How do I change the above expression to simply skip these double angled brackets? 如何更改上面的表达式以简单地跳过这些双尖括号?

Not sure why the $ at the end is in there too, but anyway... negative lookahead and lookbehind can solve this problem: 不确定为什么末尾的$也在那里,但是无论如何... 负向后看和向后看可以解决此问题:

Regex regex = new Regex("(?<![<])<[^<>]+>(?![>])");
return regex.Replace(input, String.Empty);

This will match any < not preceded by another <, then the content, and then any > not followed by another >. 这将匹配任何<不以另一个<开头的<,然后是内容,然后是不以另一个>开头的任何>。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM