简体   繁体   中英

Converting between ISO-8559-1 and cp1251

My Android app uses an open-source library that only accepts text data in an ISO-8859-1 encoding. I have a few users from Eastern Europe who would like to enter cp1251-encoded text. This seems to be a limitation of the open-source library, as Java is fully capable of supporting these formats as well as unicode formats.

One option could be to modify the open-source library to support multiple character sets. Would it be possible to convert cp1251 to ISO-8859-1 and then back again? Since they are both 8-bit language encodings, it seems like you would be storing the same amount of data at a byte level. However, when the open-source library loads the byte data into a string with ISO-8859-1 encoding, any byte value not present in ISO-8859-1 would likely throw an exception.

I'm not a character set expert, but the fact that I can't find code samples doing this conversion leads me to believe it won't work, at least not reliably.

You are correct that this won't work very well at all. Most of the non-ASCII characters in CP1251 are not present in ISO8859-1. (CP1251 is Eastern European, and contains a lot of Cyrillic characters; ISO8859-1 is Western European, and contains a mix of accented Latin characters, punctuation, and symbols.) There are a few characters which are represented in both, but so few (and almost all of them are punctuation) that it probably won't do you any good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM