简体   繁体   English

如何从JSoup'Document'中删除不间断的空格?

[英]How can I remove non-breaking spaces from a JSoup 'Document'?

How can I remove these: 我该如何删除这些:

<td>&nbsp;</td>

or 要么

<td width="7%">&nbsp;</td>

from my JSoup 'Document'? 来自我的JSoup'文档'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors. 我尝试了很多方法,但是这些不间断的空格字符与普通的JSoup表达式或选择器不匹配。

The HTML entity &nbsp; HTML实体&nbsp; ( Unicode character NO-BREAK SPACE U+00A0 ) can in Java be represented by the character . Unicode字符NO-BREAK SPACE U + 00A0 )可以在Java中用字符 Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work: 假设您要删除包含该字符的每个元素作为自己的文本(因此不是您在评论中所说的每一 ),那么以下内容应该有效:

document.select(":containsOwn(\u00a0)").remove();

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line. 如果你真的想删除整行,那么你最好的选择就是逐行扫描HTML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM