简体   繁体   中英

Java converts standard String to CP1250 with only one byte for every char

I need to convert standard String to CP1250 with only one byte for every char, so for example polish char 'ł' should be parsed to 0xB3, no unicode with two bytes. When I'm trying to do something like that:

byte[] array = "ała".getBytes();
s = new String(array, 0, array.length, Charset.forName("CP1250"));

and next if I'm doing s.getBytes(); it returns more bytes than letters, and for 'ł' is 2 bytes like unicode. I need to converts every String and get bytes from them to exactly CP1250 codes like here: https://pl.wikipedia.org/wiki/Windows-1250#Tablica_kod.C3.B3w

通过在将字符串转换为字节时提供字符集来做到这一点:

    byte[] array = "ała".getBytes("CP1250");

You are converting a String to a byte array using Java's default charset, whatever that happens to be (it could be UTF-8, it could be something else. It is a configurable option). And then you are converting those bytes back to a String , but telling the converter that the bytes are encoded as CP1250, which they might not be. So you could end up with a corrupted String . But either way, you still end up back with a String , which is not what you are asking for.

You need to tell getBytes() that you want the bytes to be encoded as CP1250, eg:

byte[] array = "ała".getBytes("CP1250");

Or:

byte[] array = "ała".getBytes(Charset.forName("CP1250"));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM