简体   繁体   English

如何使Jsoup白名单接受某些属性内容

[英]How to make a Jsoup whitelist to accept certain attribute content

I'm using Jsoup with relaxed whitelist. 我正在使用Jsoup和轻松的白名单。 It seems perfect but I would like to keep the embedded images tags like <img alt="" src="data:;base64 . 它似乎很完美,但我想保留嵌入式图像标签,如<img alt="" src="data:;base64

Is there a way to modify the whitelist to accept also those img? 有没有办法修改白名单也接受那些img?

Edit : 编辑

If I use Whitelist.relaxed().addProtocols("img","src","data") then those img tags are not removed. 如果我使用Whitelist.relaxed().addProtocols("img","src","data")则不会删除那些img标签。 But it accepts anything after "data:" and I would like just to keep them if src content starts with "data:;base64". 但它接受“data:”之后的任何内容,如果src内容以“data:; base64”开头,我想保留它们。 Is it possible with jsoup? jsoup有可能吗?

You can extend Whitelist and override isSafeAttribute to perform custom checks. 您可以扩展Whitelist并覆盖isSafeAttribute以执行自定义检查。 As there's no way to extend Whitelist.relaxed() directly, you'll have to copy some code to set up the same list: 由于无法直接扩展Whitelist.relaxed(),您必须复制一些代码来设置相同的列表:

public class RelaxedPlusDataBase64Images extends Whitelist {
    public RelaxedPlusDataBase64Images() {
        //copied from Whitelist.relaxed()
        addTags("a", "b", "blockquote", "br", "caption", "cite", "code", "col",
                "colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
                "i", "img", "li", "ol", "p", "pre", "q", "small", "strike", "strong",
                "sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
                "ul");
        addAttributes("a", "href", "title");
        addAttributes("blockquote", "cite");
        addAttributes("col", "span", "width");
        addAttributes("colgroup", "span", "width");
        addAttributes("img", "align", "alt", "height", "src", "title", "width");
        addAttributes("ol", "start", "type");
        addAttributes("q", "cite");
        addAttributes("table", "summary", "width");
        addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width");
        addAttributes("th", "abbr", "axis", "colspan", "rowspan", "scope", "width");
        addAttributes("ul", "type");
        addProtocols("a", "href", "ftp", "http", "https", "mailto");
        addProtocols("blockquote", "cite", "http", "https");
        addProtocols("cite", "cite", "http", "https");
        addProtocols("img", "src", "http", "https");
        addProtocols("q", "cite", "http", "https");
    }

    @Override
    protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
        return ("img".equals(tagName)
                && "src".equals(attr.getKey())
                && attr.getValue().startsWith("data:;base64")) ||
            super.isSafeAttribute(tagName, el, attr);
    }
}

As you haven't provided the code you're using to parse or the HTML you're sanitizing, I haven't tested this. 由于您尚未提供用于解析的代码或正在清理的HTML,因此我没有对此进行测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM