jsoup parse html tag attribute

Question

For Example:

<html>
   <head></head>
   <body sometag='"'></body>
</html>

When I use Jsoup to parse this html like:

Document doc = Jsoup.parse(html);
doc.outputSettings().prettyPrint(false);
System.out.println(doc.toString());

It will become

<html>
   <head></head>
   <body sometag="&quot;"></body>
</html>

Take notice of the ' and " , I dont't want it parsing ' and " ,I just need it to get some text is there any way to avoid jsoup parsing this. thanks a lot

Answer 1

Just don't use an HTML parser. Use an XML parser instead.

Document doc = Jsoup.parse(html, "", Parser.xmlParser());

Answer 2

So I've played around a little bit with different String escaping and the easiest way to achieve this is to do the following:

Though this may not be what you after but we'll see.

String html = "<html> <head> </head> <body sometag='\"'> </body> </html>";

Document doc = Jsoup.parse(html);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
System.out.println( StringEscapeUtils.unescapeXml( doc.toString() ) );

jsoup parse html tag attribute

Question

2 answers

solution1
0 2018-02-08 06:20:19

solution2
0 2018-02-08 11:25:47

jsoup parse html tag attribute

Question

2 answers

solution1 0 2018-02-08 06:20:19

solution2 0 2018-02-08 11:25:47

solution1
0 2018-02-08 06:20:19

solution2
0 2018-02-08 11:25:47