从C＃中的字符串中删除HTML标记和注释？

Question

How do I remove everything beginning in '<' and ending in '>' from a string in C#. 如何从C＃中的字符串中删除以“<”开头并以“>”结尾的所有内容。 I know it can be done with regex but I'm not very good with it. 我知道它可以用正则表达式完成，但我对它不是很好。

Answer 1

The tag pattern I quickly wrote for a recent small project is this one. 我最近为一个小项目写的标签模式就是这个。

string tagPattern = @"<[!--\W*?]*?[/]*?\w+.*?>";

I used it like this 我这样用它

MatchCollection matches = Regex.Matches(input, tagPattern);
foreach (Match match in matches)
{
    input = input.Replace(match.Value, string.Empty);
}

It would likely need to be modified to correctly handle script or style tags. 可能需要修改它才能正确处理脚本或样式标记。

Answer 2

Another non-regex code that works 8x faster than regex: 另一个非正则表达式的代码比正则表达式快8倍：

public static string StripTagsCharArray(string source)
{
    char[] array = new char[source.Length];
    int arrayIndex = 0;
    bool inside = false;
    for (int i = 0; i < source.Length; i++)
    {
        char let = source[i];
        if (let == '<')
        {
            inside = true;
            continue;
        }
        if (let == '>')
        {
            inside = false;
            continue;
        }
        if (!inside)
        {
            array[arrayIndex] = let;
            arrayIndex++;
        }
    }
    return new string(array, 0, arrayIndex);
}

Answer 3

Non regex option: But it still won't parse nested tags! 非正则表达式选项：但它仍然不会解析嵌套标签！

public static string StripHTML(string line)
        {
            int finished = 0;
            int beginStrip;
            int endStrip;

            finished = line.IndexOf('<');
            while (finished != -1)
            {
                beginStrip = line.IndexOf('<');
                endStrip = line.IndexOf('>', beginStrip + 1);
                line = line.Remove(beginStrip, (endStrip + 1) - beginStrip);
                finished = line.IndexOf('<');
            } 

            return line;
        }

从C＃中的字符串中删除HTML标记和注释？

问题描述

3 个解决方案

解决方案1
3 已采纳 2010-04-09 19:28:05

解决方案2
1 2014-08-14 10:05:37

解决方案3
1 2010-04-09 19:41:18

从C＃中的字符串中删除HTML标记和注释？

问题描述

3 个解决方案

解决方案1 3 已采纳 2010-04-09 19:28:05

解决方案2 1 2014-08-14 10:05:37

解决方案3 1 2010-04-09 19:41:18

解决方案1
3 已采纳 2010-04-09 19:28:05

解决方案2
1 2014-08-14 10:05:37

解决方案3
1 2010-04-09 19:41:18