I'm using JSoup to sanitize some untrusted HTML. I discovered that if I call
String html = "<div id='foo'><script type='text/javascript'>alert('hello');</script></div>";
String cleanedHtml = Jsoup.clean(html, Whitelist.relaxed());
At this point cleanedHtml
is
<div><div>
So the <script>
tag has correctly been removed, but mysteriously, so has the id
attribute of the <div>
. Is there any good reason why this should be removed or is it a bug?
By default the id
attribute is removed; add it as an allowable attribute:
Whitelist whitelist = Whitelist.relaxed().addAttributes("div", "id");
System.out.println(Jsoup.clean(html, whitelist));
=> <div id="foo"></div>
Is it a bug? Not AFAIC; it's in the source. IMO there are documentation bugs, though.
Is there "any good reason" why this should be removed? Not sure about that one, but attributes like this aren't structural: removing it doesn't alter the DOM. That's the thing about whitelists–they explicitly allow, and must be curated to match your precise needs.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.