正则表达式删除HTML，但有例外

Question

In C#, I have the following regex expression to remove HTML from a string: 在C＃中，我有以下正则表达式可从字符串中删除HTML：

var regex = new Regex("<[^>]*(>|$)");
return regex.Replace(input, match => "");

There are some cases where we need to allow for double >> and << . 在某些情况下，我们需要允许使用double >>和<< 。 How do I change the above expression to simply skip these double angled brackets? 如何更改上面的表达式以简单地跳过这些双尖括号？

Answer 1

Not sure why the $ at the end is in there too, but anyway... negative lookahead and lookbehind can solve this problem: 不确定为什么末尾的$也在那里，但是无论如何... 负向后看和向后看可以解决此问题：

Regex regex = new Regex("(?<![<])<[^<>]+>(?![>])");
return regex.Replace(input, String.Empty);

This will match any < not preceded by another <, then the content, and then any > not followed by another >. 这将匹配任何<不以另一个<开头的<，然后是内容，然后是不以另一个>开头的任何>。

正则表达式删除HTML，但有例外

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-03-17 14:58:21

正则表达式删除HTML，但有例外

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-03-17 14:58:21

解决方案1
1 已采纳 2016-03-17 14:58:21