如何使用Jsoup从HTML解析新行

Question

When i am parsing a HTML file using jsoup, texts in multiple lines (with   ) in the HTML file is presented as a single line without new lines( \\n ). 当我使用jsoup解析HTML文件时，HTML文件中多行（带有  />）的文本显示为单行，没有新行（ \\n ）。 How i can parse the multi line HTML document as multiline strings ?? 我如何将多行HTML文档解析为多行字符串？

I am using the method: Element.text() 我正在使用方法： Element.text()

Eg: 例如：

HTML contains C code which is properly displayed in multiple lines in HtMl file, but when i am taking the text data, all the data are presented in a single line without new line charactors. HTML包含在HtMl文件中以多行正确显示的C代码，但是当我获取文本数据时，所有数据都在一行中显示而没有新的行描述符。

Answer 1

Replace   with something else and back, like this: 将 替换为其他内容并返回，如下所示：

Document doc = Jsoup.connect("http://www.ejemplo.html").get(); //Here included the <br>'s
String temp = doc.html().replace("<br />", "$$$"); //$$$ instead <br>
doc = Jsoup.parse(temp); //Parse again

String text = doc.body().text().replace("$$$", "\n").toString()); //example
//I get back the new lines (\n)

Answer 2

The text() method of Element (and TextNode ) calls appendWhitespaceIfBr(...) which will replace every   (or whitespace) with a blank. Element（和TextNode ）的text()方法调用appendWhitespaceIfBr(...) ，它将用空白替换每个  （或空格）。 Unfortunately i see no mechanism for turning this off without working on the code. 不幸的是，我认为没有机制可以在不使用代码的情况下关闭它。

But maybe you can try replacing all   Tags with a new subclass of Node . 但也许您可以尝试用Node的新子类替换所有 标签。

如何使用Jsoup从HTML解析新行

问题描述

2 个解决方案

解决方案1
3 2012-12-06 00:25:35

解决方案2
0 2012-11-20 20:43:42

如何使用Jsoup从HTML解析新行

问题描述

2 个解决方案

解决方案1 3 2012-12-06 00:25:35

解决方案2 0 2012-11-20 20:43:42

解决方案1
3 2012-12-06 00:25:35

解决方案2
0 2012-11-20 20:43:42