For Example:
<html>
<head></head>
<body sometag='"'></body>
</html>
When I use Jsoup to parse this html like:
Document doc = Jsoup.parse(html);
doc.outputSettings().prettyPrint(false);
System.out.println(doc.toString());
It will become
<html>
<head></head>
<body sometag="""></body>
</html>
Take notice of the ' and " , I dont't want it parsing ' and " ,I just need it to get some text is there any way to avoid jsoup parsing this. thanks a lot
Just don't use an HTML parser. Use an XML parser instead.
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
So I've played around a little bit with different String escaping and the easiest way to achieve this is to do the following:
Though this may not be what you after but we'll see.
String html = "<html> <head> </head> <body sometag='\"'> </body> </html>";
Document doc = Jsoup.parse(html);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
System.out.println( StringEscapeUtils.unescapeXml( doc.toString() ) );
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.