Using Jsoup.clean()
, jsoup turns the title
attribute of a HTML link from:
<a href="" title="test <br />">TEST</a>
into:
<a href="" title="test <br />">TEST</a>
This is the demo application:
Whitelist whitelist = new Whitelist();
whitelist.addTags("a");
whitelist.addAttributes("a", "href", "title");
String input = "<a href=\"\" title=\"test <br />\">TEST</a>";
System.out.println("input: " + input);
String output = Jsoup.clean(input, whitelist);
System.out.println("output: " + output);
which prints:
input: <a href="" title="test <br />">TEST</a>
output: <a href="" title="test <br />">TEST</a>
I tried to add OutputSettings
with EscapeMode
:
OutputSettings outputSettings = new OutputSettings();
outputSettings.escapeMode(EscapeMode.xhtml);
EscapeMode.base
and EscapeMode.extend
have no effect. EscapeMode.xhtml
prints the following:
input: <a href="" title="test <br />">TEST</a>
output: <a href="" title="test <br />">TEST</a>
Any idea how jsoup does not manipulate the title
tag?
This is a known issue/behavior: https://github.com/jhy/jsoup/issues/684 (marked as "won't fix" by the jsoup team).
There's not a bug here.
When serializing (ie in your example when you're printing out XML/HTML), we escape as few characters as necessary. That is why the > is not escaped to >; because it's in a quoted attribute, there's no ambiguity that it's closing a tag, so it doesn't get escaped.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.