[英]How can I get the non-breaking spaces between two nodes when using a node visitor?
I try to parse the following HTML source code:我尝试解析以下 HTML 源代码:
<a href="./">Home</a>
<a href="http://gouessej.wordpress.com/tag/tuer/">Blog</a>
I implement the interface org.jsoup.select.NodeVisitor
.我实现了接口org.jsoup.select.NodeVisitor
。 However, it seems to skip the content between </a>
and <a
.但是,它似乎跳过了</a>
和<a
之间的内容。 Disabling the pretty printing doesn't solve my problem.禁用漂亮的打印并不能解决我的问题。
You can run the first JUnit test to reproduce this bug: https://github.com/gouessej/HtmlFlow/blob/patch-1/src/test/java/htmlflow/flowifier/test/TestFlowifier.java It converts the HTML source code of my homepage into Java source code, it converts this Java source code back to HTML and it compares the resulting HTML source code to the original source code. You can run the first JUnit test to reproduce this bug: https://github.com/gouessej/HtmlFlow/blob/patch-1/src/test/java/htmlflow/flowifier/test/TestFlowifier.java It converts the HTML source code of my homepage into Java source code, it converts this Java source code back to HTML and it compares the resulting HTML source code to the original source code.
PS: Actually TextNode.getWholeText()
returns \n
instead of \n
. PS:实际上TextNode.getWholeText()
返回\n
而不是 \n
。
TextNode.getWholeText()
returns some unescaped text, I just need to escape it by calling Entities.escape(TextNode.getWholeText())
. TextNode.getWholeText()
返回一些未转义的文本,我只需要通过调用Entities.escape(TextNode.getWholeText())
来转义它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.