简体   繁体   English

使用节点访问者时,如何获得两个节点之间的不间断空格?

[英]How can I get the non-breaking spaces between two nodes when using a node visitor?

I try to parse the following HTML source code:我尝试解析以下 HTML 源代码:

<a href="./">Home</a>&nbsp;&nbsp;&nbsp;
<a href="http://gouessej.wordpress.com/tag/tuer/">Blog</a>&nbsp;&nbsp;&nbsp;

I implement the interface org.jsoup.select.NodeVisitor .我实现了接口org.jsoup.select.NodeVisitor However, it seems to skip the content between </a> and <a .但是,它似乎跳过了</a><a之间的内容。 Disabling the pretty printing doesn't solve my problem.禁用漂亮的打印并不能解决我的问题。

You can run the first JUnit test to reproduce this bug: https://github.com/gouessej/HtmlFlow/blob/patch-1/src/test/java/htmlflow/flowifier/test/TestFlowifier.java It converts the HTML source code of my homepage into Java source code, it converts this Java source code back to HTML and it compares the resulting HTML source code to the original source code. You can run the first JUnit test to reproduce this bug: https://github.com/gouessej/HtmlFlow/blob/patch-1/src/test/java/htmlflow/flowifier/test/TestFlowifier.java It converts the HTML source code of my homepage into Java source code, it converts this Java source code back to HTML and it compares the resulting HTML source code to the original source code.

PS: Actually TextNode.getWholeText() returns \n instead of &nbsp;&nbsp;&nbsp;\n . PS:实际上TextNode.getWholeText()返回\n而不是&nbsp;&nbsp;&nbsp;\n

TextNode.getWholeText() returns some unescaped text, I just need to escape it by calling Entities.escape(TextNode.getWholeText()) . TextNode.getWholeText()返回一些未转义的文本,我只需要通过调用Entities.escape(TextNode.getWholeText())来转义它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从JSoup&#39;Document&#39;中删除不间断的空格? - How can I remove non-breaking spaces from a JSoup 'Document'? 为什么当我尝试从字符串中删除不间断空格时,我没有得到预期的结果? - Why when I try to remove non-breaking space from the string, I do not get the expected result? Java Regex只用非中断空格替换多个空格 - Java Regex that only replaces multiple whitepaces with Non-Breaking Spaces 如何使用apache pdf框将`Non-breaking space`打印为pdf? - How to print `Non-breaking space` to a pdf using apache pdf box? 为什么在分区列表时打破两个列表节点之间的链接? - Why breaking the link between two list nodes when partitioning a list? 如何在两个虚拟节点之间的双向链表的开头添加节点? - How can I add a node at the beginning of a doubly linked list in-between two dummy nodes? 如何在HBox中的其他两个节点之间添加节点? - How do I add a node between two other nodes in a HBox? 如何在正则表达式中识别两个没有空格的标记? - How can I recognize two tokens with no spaces between in a regex? 如何找到给定两个节点之间的路径? - How can I find a path between given two nodes? 如何避免在带有xpath的节点之间返回空格和行返回? - How can I keep from returning white spaces and line returns in-between nodes with xpath?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM