如何安全地注入 HTML 代碼（避免 HTML 注入）

Question

在我們的企業軟件中，我們允許客戶提供他們自己的 HTML 來自定義應用程序的“聯系”和“法律”頁面。 這是一個很好的功能，但由於我們的應用程序對實際提供 HTML 的應用程序一無所知，我想知道我將如何解決這樣的問題。 我已經閱讀了一些博客文章、SO 帖子並觀看了一些視頻，但這些只是解釋了 HTML 注入的危險或如何使用createElement或innerHTML或其他直接方法來做到這一點。

我正在尋找最安全的方法來顯示我無法直接控制的 HTML。 任何文章或圖書館將不勝感激。

Answer 1

您應該清理用戶輸入的 HTML 代碼。

一種方法是定義一個白名單：您定義一個標簽列表（並且對於每個標簽，您還可以定義一個允許的屬性列表），這些標簽允許出現在輸出 HTML 中。 未明確允許的所有其他標簽都將被清理刪除。

已經有很多可用的白名單。 例如，您可以將其用於TinyMCE HTML 編輯器（請參閱此處）：

<?xml version="1.0" encoding="UTF-8" ?>

<!--
    TinyMCE
-->

<anti-samy-rules xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="antisamy.xsd">

    <directives>
        <directive name="omitXmlDeclaration" value="true" />
        <directive name="omitDoctypeDeclaration" value="false" />
        <directive name="maxInputSize" value="100000" />
        <directive name="embedStyleSheets" value="false" />
        <directive name="useXHTML" value="true" />
        <directive name="formatOutput" value="true" />
    </directives>

    <common-regexps>

        <!--
            From W3C:
            This attribute assigns a class name or set of class names to an
            element. Any number of elements may be assigned the same class
            name or names. Multiple class names must be separated by white
            space characters.
        -->
        <regexp name="htmlTitle" value="[a-zA-Z0-9\s\-_',:\[\]!\./\\\(\)&amp;]*" />

        <!--  force non-empty with a '+' at the end instead of '*'
        -->
        <regexp name="onsiteURL" value="([\p{L}\p{N}\p{Zs}/\.\?=&amp;\-~])+" />

        <!--  ([\w\\/\.\?=&amp;;\#-~]+|\#(\w)+)
        -->

        <!--  ([\p{L}/ 0-9&amp;\#-.?=])*
        -->
        <regexp name="offsiteURL" value="(\s)*((ht|f)tp(s?)://|mailto:)[A-Za-z0-9]+[~a-zA-Z0-9-_\.@\#\$%&amp;;:,\?=/\+!\(\)]*(\s)*" />
    </common-regexps>

    <!--
        Tag.name = a, b, div, body, etc.
        Tag.action = filter: remove tags, but keep content, validate: keep content as long as it passes rules, remove: remove tag and contents
        Attribute.name = id, class, href, align, width, etc.
        Attribute.onInvalid = what to do when the attribute is invalid, e.g., remove the tag (removeTag), remove the attribute (removeAttribute), filter the tag (filterTag)
        Attribute.description = What rules in English you want to tell the users they can have for this attribute. Include helpful things so they'll be able to tune their HTML
    -->

    <!--
        Some attributes are common to all (or most) HTML tags. There aren't many that qualify for this. You have to make sure there's no
        collisions between any of these attribute names with attribute names of other tags that are for different purposes.
    -->

    <common-attributes>

        <attribute name="lang"
            description="The 'lang' attribute tells the browser what language the element's attribute values and content are written in">

            <regexp-list>
                <regexp value="[a-zA-Z]{2,20}" />
            </regexp-list>
        </attribute>

        <attribute name="title"
            description="The 'title' attribute provides text that shows up in a 'tooltip' when a user hovers their mouse over the element">

            <regexp-list>
                <regexp name="htmlTitle" />
            </regexp-list>
        </attribute>

        <attribute name="href" onInvalid="filterTag">

            <regexp-list>
                <regexp name="onsiteURL" />
                <regexp name="offsiteURL" />

                <!--
                -->
            </regexp-list>
        </attribute>

        <attribute name="align"
            description="The 'align' attribute of an HTML element is a direction word, like 'left', 'right' or 'center'">

            <literal-list>
                <literal value="center" />
                <literal value="left" />
                <literal value="right" />
                <literal value="justify" />
                <literal value="char" />
            </literal-list>
        </attribute>
        <attribute name="style"
            description="The 'style' attribute provides the ability for users to change many attributes of the tag's contents using a strict syntax" />
    </common-attributes>

    <!--
        This requires normal updates as browsers continue to diverge from the W3C and each other. As long as the browser wars continue
        this is going to continue. I'm not sure war is the right word for what's going on. Doesn't somebody have to win a war after
        a while?


    -->

    <global-tag-attributes>
        <attribute name="title" />
        <attribute name="lang" />
        <attribute name="style" />
    </global-tag-attributes>

    <tags-to-encode>
        <tag>g</tag>
        <tag>grin</tag>
    </tags-to-encode>

    <tag-rules>

        <!--  Remove  -->

        <tag name="script" action="remove" />
        <tag name="noscript" action="remove" />
        <tag name="iframe" action="remove" />
        <tag name="frameset" action="remove" />
        <tag name="frame" action="remove" />
        <tag name="noframes" action="remove" />
        <tag name="head" action="remove" />
        <tag name="title" action="remove" />
        <tag name="base" action="remove" />
        <tag name="style" action="remove" />
        <tag name="link" action="remove" />
        <tag name="input" action="remove" />
        <tag name="textarea" action="remove" />

        <!--  Truncate  -->
        <tag name="br" action="truncate" />

        <!--  Validate -->

        <tag name="p" action="validate">
            <attribute name="align" />
        </tag>
        <tag name="div" action="validate" />
        <tag name="span" action="validate" />
        <tag name="i" action="validate" />
        <tag name="b" action="validate" />
        <tag name="strong" action="validate" />
        <tag name="s" action="validate" />
        <tag name="strike" action="validate" />
        <tag name="u" action="validate" />
        <tag name="em" action="validate" />
        <tag name="blockquote" action="validate" />
        <tag name="tt" action="truncate" />

        <tag name="a" action="validate">
            <attribute name="href" onInvalid="filterTag" />

            <attribute name="nohref">

                <literal-list>
                    <literal value="nohref" />
                    <literal value="" />
                </literal-list>
            </attribute>

            <attribute name="rel">

                <literal-list>
                    <literal value="nofollow" />
                </literal-list>
            </attribute>
        </tag>

        <!--  List tags
        -->
        <tag name="ul" action="validate" />
        <tag name="ol" action="validate" />
        <tag name="li" action="validate" />
        <tag name="dl" action="validate" />
        <tag name="dt" action="validate" />
        <tag name="dd" action="validate" />
    </tag-rules>

    <css-rules>

        <property name="text-decoration" default="none"
            description="">

            <category-list>
                <category value="visual" />
            </category-list>

            <literal-list>
                <literal value="underline" />
                <literal value="overline" />
                <literal value="line-through" />
            </literal-list>
        </property>
    </css-rules>
</anti-samy-rules>

此策略旨在清理在 HTML 編輯器中輸入的 HTML，因此它可能適合您的需求。 您還可以在此處找到更多政策。

如果您使用 Java，則可以通過java-html-sanitizer實現該策略，也可以定義自定義策略。

Answer 2

您可以使用 sanitize-html。 更多信息： https : //www.npmjs.com/package/sanitize-html

如何安全地注入 HTML 代碼（避免 HTML 注入）

問題描述

2 個解決方案

解決方案1
1 已采納 2021-02-03 10:03:37

解決方案2
0 2021-01-31 03:56:52

如何安全地注入 HTML 代碼（避免 HTML 注入）

問題描述

2 個解決方案

解決方案1 1 已采納 2021-02-03 10:03:37

解決方案2 0 2021-01-31 03:56:52

解決方案1
1 已采納 2021-02-03 10:03:37

解決方案2
0 2021-01-31 03:56:52