[英]How to get unformatted html from Jsoup
String testCases[] = {
"<table><tbody><tr><td><div><inline>Normal Line Text</inline><br/></div></td></tr></tbody></table>",
};
for (String testString : testCases) {
Document doc = Jsoup.parse(testString,"", Parser.xmlParser());
Elements elements = doc.select("table");
for (Element ele : elements) {
System.out.println("===============================================");
System.out.println(ele.html()); //Formatted
System.out.println("-----------------------------------------------");
System.out.println(ele.html().trim().replace("\n","").replace("\r","")); //Notice the Difference
}
}
Output: 输出:
===============================================
<tbody>
<tr>
<td>
<div>
<inline>
Normal Line Text
</inline>
<br />
</div></td>
</tr>
</tbody>
-----------------------------------------------
<tbody> <tr> <td> <div> <inline> Normal Line Text </inline> <br /> </div></td> </tr></tbody>
Due to the formatting done by JSoup, the value of textNodes change to include newlines. 由于JSoup进行了格式化,因此textNodes的值更改为包括换行符。
Changing <inline>
to <span>
in the test case seems to work fine, but unfortunately, we have legacy data/html containing <inline>
tags generated by redactor. 在测试用例中将
<inline>
更改为<span>
似乎可以正常工作,但是不幸的是,我们有包含由redactor生成的<inline>
标记的旧数据/ html。
Try this: 尝试这个:
Document doc = Jsoup.parse(testString,"", Parser.xmlParser());
doc.outputSettings().prettyPrint(false);
Hope it helps. 希望能帮助到你。
Taken from https://stackoverflow.com/a/19602313/3324704 取自https://stackoverflow.com/a/19602313/3324704
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.