I am trying to parse a XHTML file with Jsoup and its stripping the closing slash on some of my tags. ie:
<link rel="stylesheet" type="text/css" href="/css/assessment.css" />
becomes
<link rel="stylesheet" type="text/css" href="/css/assessment.css">
I have tried some of the other answers here:
Jsoup: How to convert a String containing HTML to a XHTML document? https://github.com/jhy/jsoup/issues/511 jsoup: differnt result after updating from 1.7.3 to 1.8.1, how to avoid this?
With my latest attempt being:
File input = new File("src\\main\\resources\\templates\\assessmenttemplate.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
doc.outputSettings().charset("UTF-8")
I also tried to change the doctype:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
But the problem persists. How to parse HTML without stripping the trailing slashes?
This worked:
File input = new File("src\\main\\resources\\templates\\assessmenttemplate.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
doc.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
doc.outputSettings().charset("UTF-8");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.