用敏捷包消毒未知数量的后代不起作用

Question

The purpose of this code below is to be able to accept strings from cliënts that might contain HTML and remove styling, scripting, certain tags and replace H tags by B tags. 下面这段代码的目的是能够接受可能包含HTML的客户端字符串，并删除样式，脚本，某些标签并用B标签替换H标签。

  private IDictionary<string, string[]> Whitelist;
    public vacatures PostPutVacancy(vacancy vacancy)
    {
        //List of allowed tags
        Whitelist = new Dictionary<string, string[]> {
            { "p", null },
            { "ul", null },
            { "li", null },
            { "br", null },
            { "b", null },
            { "table", null },
            { "tr", null },
            { "th", null },
            { "td", null },
            { "strong", null }
        };

        foreach (var item in vacancy.GetType().GetProperties())
        {
            if (vacancy.GetType().GetProperty(item.Name).PropertyType.FullName.Contains("String"))
            {
                var value = item.GetValue(vacancy, null);
                if (value != null)
                {
                    item.SetValue(vacancy, CallSanitizers(item.GetValue(vacancy, null)));
                    var test1 = item.GetValue(vacancy);
                }
            }
        }

        return vacancy;
    }

    private List<string> hList = new List<string>
    {
        { "h1"},
        { "h2"},
        { "h3"},
        { "h4"},
        { "h5"},
        { "h6"}
    };

    private string CallSanitizers(object obj)//==Sanitize()
    {
        string str = obj.ToString();

        if (str != HttpUtility.HtmlEncode(str))
        {
            doc.LoadHtml(str);
            SanitizeNode(doc.DocumentNode);
            string test = doc.DocumentNode.WriteTo().Trim();
            return doc.DocumentNode.WriteTo().Trim();
        }
        else
        {
            return str;
        }
    }

    private void SanitizeChildren(HtmlNode parentNode)
    {
        for (int i = parentNode.ChildNodes.Count - 1; i >= 0; i--)
        {
            SanitizeNode(parentNode.ChildNodes[i]);
        }
    }

    private void SanitizeNode(HtmlNode node)
    {
        if (node.NodeType == HtmlNodeType.Element)
        {
            if (!Whitelist.ContainsKey(node.Name))
            {
                if (hList.Contains(node.Name))
                {
                    HtmlNode b = doc.CreateElement("b");
                    b.InnerHtml = node.InnerHtml;
                    node.ParentNode.ReplaceChild(b, node);
                }
                else
                {
                    node.ParentNode.RemoveChild(node, true);
                }
            }

            if (node.HasAttributes)
            {
                for (int i = node.Attributes.Count - 1; i >= 0; i--)
                {
                    HtmlAttribute currentAttribute = node.Attributes[i];
                    node.Attributes.Remove(currentAttribute);
                }
            }
        }

        if (node.HasChildNodes)
        {
            SanitizeChildren(node);
        }
    }

It works but there is one problem, child nodes of child nodes don't get sanitized, see example. 它有效，但是存在一个问题，子节点的子节点没有被清理，请参见示例。

Input: 输入：

"Lorem ipsum<h1 style='font-size:38px;'><p style='font-size:38px;'>dolor sit</p></h1> amet <h1 style='font-size:38px;'><strong style='font-size:38px;'>consectetur adipiscing</strong></h1>"

Result: 结果：

"Lorem ipsum<b><p style='font-size:38px;'>dolor sit</p></b> amet <b style='font-size:38px;'><strong style='font-size:38px;'>consectetur adipiscing</strong></b>"

The problem must be due to not being able to place a child back into a changed parent since the parent not recognized anymore because of the change of tag type. 问题一定是由于无法将孩子放回更改后的父级中，因为由于标记类型的更改，父级不再能被识别。

Does anybody know how to fix this? 有人知道如何解决此问题吗？

Please post a comment if the question is unclear or not well formulated. 如果问题不清楚或措辞不当，请发表评论。

Thanks in advance 提前致谢

Answer 1

This fixes it 这修复了它

        private string CallSanitizers(string str)
    {

        if (str != HttpUtility.HtmlEncode(str))
        {
            doc.LoadHtml(str);
            str = Sanitizers();
            return doc.DocumentNode.WriteTo().Trim();
        }
        else
        {
            return str;
        }
    }

    private string Sanitizers()
    {
        doc.DocumentNode.Descendants().Where(l => l.Name == "script" || l.Name == "style").ToList().ForEach(l => l.Remove());
        doc.DocumentNode.Descendants().Where(l => hList.Contains(l.Name)).ToList().ForEach(l => l.Name = "b");
        doc.DocumentNode.Descendants().Where(l => l.Attributes != null).ToList().ForEach(l => l.Attributes.ToList().ForEach(a => a.Remove()));
        doc.DocumentNode.Descendants().Where(l => !Whitelist.Contains(l.Name) && l.NodeType == HtmlNodeType.Element).ToList().ForEach(l => l.ParentNode.RemoveChild(l, true));
        return doc.DocumentNode.OuterHtml;
    }

    //lijst van tags die worden vervangen door <b></b>
    private List<string> hList = new List<string>
    {
        { "h1"},
        { "h2"},
        { "h3"},
        { "h4"},
        { "h5"},
        { "h6"}
    };

    List<string> Whitelist = new List<string>
    {
        { "p"},
        { "ul"},
        { "li"},
        { "br"},
        { "b"},
        { "table"},
        { "tr"},
        { "th"},
        { "td"},
        { "strong"}
    };

The input is 输入是

"<head><script>alert('Hello!');</script></head><div><div><h1>Lorem ipsum </h1></div></div> <h1 style='font-size:38px;'><p style='font-size:38px;'>dolor </p></h1> sit <h1 style='font-size:38px;'><strong style='font-size:38px;'>amet</strong></h1>"

And the output is 输出是

"<b>Lorem ipsum</b> <b><p>dolor</p></b> sit <b><strong>amet</strong></b>"

用敏捷包消毒未知数量的后代不起作用

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-01-03 13:18:40

用敏捷包消毒未知数量的后代不起作用

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-01-03 13:18:40

解决方案1
0 已采纳 2017-01-03 13:18:40