简体   繁体   中英

Understanding Java Encoding

I am trying to determine if an in-house method will decode a byte array correctly given different encodings. The following code is how I approached generating data to encode.

public class Encoding {

  static byte[] VALUES = {(byte) 0x00, ..... (byte) 0xFF};
  static String[] ENCODING = {"Windows-1252","ISO-8859-1"};

  public static void main(String[] args) throws UnsupportedEncodingException {

    for(String encode : ENCODING) {
      for(byte value : VALUES) {
        byte[] inputByte = new byte[]{value};
        String input = new String(inputByte, encode);
        String houseInput = houseMethod(input.getBytes());
      }
    }
  }
}

My question is when it comes making the call to the house method, what encoding will it send to that method? It is my understanding when Java stores a String, it converts it to UTF-16. So when I am sending Input.getBytes(), is it sending the UTF-16 encoding byte or the encoding scheme that I set when I created a new String? I am guessing that it is UTF-16, but I am not sure. Should the house method be???

houseMethod(input.getBytes(encode))

See String.getBytes():

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

You are well advised to use the String.getBytes(Charset) method instead and explicitly specify the desired encoding.

As per Java documentation String.getBytes() :

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array

So the bytes that the in house method gets depend on which OS you are, as well as your locale settings.

OTH, String.getBytes(encoding) ensures you get the bytes in the encoding you pass as parameter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM