如何使Jsoup白名单接受某些属性内容

Question

I'm using Jsoup with relaxed whitelist. 我正在使用Jsoup和轻松的白名单。 It seems perfect but I would like to keep the embedded images tags like <img alt="" src="data:;base64 . 它似乎很完美，但我想保留嵌入式图像标签，如<img alt="" src="data:;base64 。

Is there a way to modify the whitelist to accept also those img? 有没有办法修改白名单也接受那些img？

Edit : 编辑：

If I use Whitelist.relaxed().addProtocols("img","src","data") then those img tags are not removed. 如果我使用Whitelist.relaxed().addProtocols("img","src","data")则不会删除那些img标签。 But it accepts anything after "data:" and I would like just to keep them if src content starts with "data:;base64". 但它接受“data：”之后的任何内容，如果src内容以“data：; base64”开头，我想保留它们。 Is it possible with jsoup? jsoup有可能吗？

Answer 1

You can extend Whitelist and override isSafeAttribute to perform custom checks. 您可以扩展Whitelist并覆盖isSafeAttribute以执行自定义检查。 As there's no way to extend Whitelist.relaxed() directly, you'll have to copy some code to set up the same list: 由于无法直接扩展Whitelist.relaxed（），您必须复制一些代码来设置相同的列表：

public class RelaxedPlusDataBase64Images extends Whitelist {
    public RelaxedPlusDataBase64Images() {
        //copied from Whitelist.relaxed()
        addTags("a", "b", "blockquote", "br", "caption", "cite", "code", "col",
                "colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
                "i", "img", "li", "ol", "p", "pre", "q", "small", "strike", "strong",
                "sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
                "ul");
        addAttributes("a", "href", "title");
        addAttributes("blockquote", "cite");
        addAttributes("col", "span", "width");
        addAttributes("colgroup", "span", "width");
        addAttributes("img", "align", "alt", "height", "src", "title", "width");
        addAttributes("ol", "start", "type");
        addAttributes("q", "cite");
        addAttributes("table", "summary", "width");
        addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width");
        addAttributes("th", "abbr", "axis", "colspan", "rowspan", "scope", "width");
        addAttributes("ul", "type");
        addProtocols("a", "href", "ftp", "http", "https", "mailto");
        addProtocols("blockquote", "cite", "http", "https");
        addProtocols("cite", "cite", "http", "https");
        addProtocols("img", "src", "http", "https");
        addProtocols("q", "cite", "http", "https");
    }

    @Override
    protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
        return ("img".equals(tagName)
                && "src".equals(attr.getKey())
                && attr.getValue().startsWith("data:;base64")) ||
            super.isSafeAttribute(tagName, el, attr);
    }
}

As you haven't provided the code you're using to parse or the HTML you're sanitizing, I haven't tested this. 由于您尚未提供用于解析的代码或正在清理的HTML，因此我没有对此进行测试。

如何使Jsoup白名单接受某些属性内容

问题描述

1 个解决方案

解决方案1
9 已采纳 2014-06-30 19:57:11

如何使Jsoup白名单接受某些属性内容

问题描述

1 个解决方案

解决方案1 9 已采纳 2014-06-30 19:57:11

解决方案1
9 已采纳 2014-06-30 19:57:11