简体   繁体   中英

Java Internationalization

I have a Java string that I'm having trouble manipulating. I have a String, s, that has a value of 丞 (a Chinese character I chose at random, I don't speak Chinese). If I call

String t = new String(s.getBytes());
if (s.equals(t))
    System.out.println("String unchanged");
else
    System.out.println("String changed");

Then I get the String changed result. Does anyone know what's going on?

Because that method :

Encodes this String into a sequence of bytes using the platform's default charset

If your default charset is ie US-ASCII you won't get the same bytes used by that Chinese letter

I imagine an extra bit/byte may be added/droppped in the process.

Try using getBytes( String charSetName )

public byte[] getBytes(String charsetName)

Using the correct charsetName

The getBytes() method uses the default encoding. According to the docs:

The CharsetEncoder class should be used when more control over the encoding process is required.

Actually, I figured this out, sorry for the post. I was using the default Java Charset, instead of explicitly casting it as a UTF-8 Charset. It works now.

String t = new String(s.getBytes()); may create string using ASCII as default charset. Use following method to create the string with charsetName as UTF-8

String(byte[] bytes, int offset, int length, String charsetName)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM