简体   繁体   中英

How can I get the non-breaking spaces between two nodes when using a node visitor?

I try to parse the following HTML source code:

<a href="./">Home</a>&nbsp;&nbsp;&nbsp;
<a href="http://gouessej.wordpress.com/tag/tuer/">Blog</a>&nbsp;&nbsp;&nbsp;

I implement the interface org.jsoup.select.NodeVisitor . However, it seems to skip the content between </a> and <a . Disabling the pretty printing doesn't solve my problem.

You can run the first JUnit test to reproduce this bug: https://github.com/gouessej/HtmlFlow/blob/patch-1/src/test/java/htmlflow/flowifier/test/TestFlowifier.java It converts the HTML source code of my homepage into Java source code, it converts this Java source code back to HTML and it compares the resulting HTML source code to the original source code.

PS: Actually TextNode.getWholeText() returns \n instead of &nbsp;&nbsp;&nbsp;\n .

TextNode.getWholeText() returns some unescaped text, I just need to escape it by calling Entities.escape(TextNode.getWholeText()) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM