简体   繁体   中英

Java JSoup library element.text() returns ' ' as a #160 ASCII character

I just recently bumped into a strange behavior of JSoup library 1.3.3 (quite old, I know).

When parsing text node, and this conatins   entity it is converted by calling .text() on this element to #160 ASCII char .

Have you experienced this? Do you think this is a correct behavior? (checked Jsoup repo for error, none found)

Thanks,

Jan

A non-breaking space is not the same as a normal space. Non breaking space is 0xA0 or 160 decimal in ISO-8859-*, Windows-1252, it is U+00A0 in Unicode (in UTF-8 it is encoded to 0xC2 0xA0). So depending on your exact encoding this is correct behaviour.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM