简体   繁体   English

Java JSoup库element.text()返回''作为#160 ASCII字符

[英]Java JSoup library element.text() returns ' ' as a #160 ASCII character

I just recently bumped into a strange behavior of JSoup library 1.3.3 (quite old, I know). 最近,我碰到了JSoup库1.3.3的一种奇怪行为(我知道它很旧)。

When parsing text node, and this conatins   解析文本节点时,此conatins   entity it is converted by calling .text() on this element to #160 ASCII char . 实体,可以通过在此元素上调用.text()将其转换#160 ASCII char

Have you experienced this? 你有没有经历过? Do you think this is a correct behavior? 您认为这是正确的行为吗? (checked Jsoup repo for error, none found) (检查了Jsoup仓库是否有错误,找不到)

Thanks, 谢谢,

Jan 一月

A non-breaking space is not the same as a normal space. 非打破空间是一样的一个正常的空间。 Non breaking space is 0xA0 or 160 decimal in ISO-8859-*, Windows-1252, it is U+00A0 in Unicode (in UTF-8 it is encoded to 0xC2 0xA0). 在ISO-8859-*,Windows-1252中,不间断空格为0xA0或十进制值为160,在Unicode中为U + 00A0(在UTF-8中,其编码为0xC2 0xA0)。 So depending on your exact encoding this is correct behaviour. 因此,根据您的确切编码,这是正确的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM