如何从Jsoup获取未格式化的html

Question

String testCases[] = {
        "<table><tbody><tr><td><div><inline>Normal Line Text</inline><br/></div></td></tr></tbody></table>",                  
};
for (String testString : testCases) {
    Document doc = Jsoup.parse(testString,"", Parser.xmlParser());
    Elements elements = doc.select("table");
    for (Element ele : elements) {
        System.out.println("===============================================");
        System.out.println(ele.html());                //Formatted
        System.out.println("-----------------------------------------------");
        System.out.println(ele.html().trim().replace("\n","").replace("\r",""));    //Notice the Difference
    }
}

Output: 输出：

===============================================
<tbody>
 <tr>
  <td>
   <div>
    <inline>
     Normal Line Text
    </inline>
    <br />
   </div></td>
 </tr>
</tbody>
-----------------------------------------------
<tbody> <tr>  <td>   <div>    <inline>     Normal Line Text    </inline>    <br />   </div></td> </tr></tbody>

Due to the formatting done by JSoup, the value of textNodes change to include newlines. 由于JSoup进行了格式化，因此textNodes的值更改为包括换行符。

Changing <inline> to <span> in the test case seems to work fine, but unfortunately, we have legacy data/html containing <inline> tags generated by redactor. 在测试用例中将<inline>更改为<span>似乎可以正常工作，但是不幸的是，我们有包含由redactor生成的<inline>标记的旧数据/ html。

Answer 1

Try this: 尝试这个：

Document doc = Jsoup.parse(testString,"", Parser.xmlParser());
doc.outputSettings().prettyPrint(false);

Hope it helps. 希望能帮助到你。

Taken from https://stackoverflow.com/a/19602313/3324704 取自https://stackoverflow.com/a/19602313/3324704

如何从Jsoup获取未格式化的html

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-11-10 13:03:32

如何从Jsoup获取未格式化的html

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-11-10 13:03:32

解决方案1
1 已采纳 2014-11-10 13:03:32