Java Internationalization

Question

I have a Java string that I'm having trouble manipulating. I have a String, s, that has a value of 丞 (a Chinese character I chose at random, I don't speak Chinese). If I call

String t = new String(s.getBytes());
if (s.equals(t))
    System.out.println("String unchanged");
else
    System.out.println("String changed");

Then I get the String changed result. Does anyone know what's going on?

Answer 1

Because that method :

Encodes this String into a sequence of bytes using the platform's default charset

If your default charset is ie US-ASCII you won't get the same bytes used by that Chinese letter

I imagine an extra bit/byte may be added/droppped in the process.

Try using getBytes( String charSetName )

public byte[] getBytes(String charsetName)

Using the correct charsetName

Answer 2

The getBytes() method uses the default encoding. According to the docs:

The CharsetEncoder class should be used when more control over the encoding process is required.

Answer 3

Actually, I figured this out, sorry for the post. I was using the default Java Charset, instead of explicitly casting it as a UTF-8 Charset. It works now.

Answer 4

String t = new String(s.getBytes()); may create string using ASCII as default charset. Use following method to create the string with charsetName as UTF-8

String(byte[] bytes, int offset, int length, String charsetName)

Java Internationalization

Question

4 answers

solution1
6 2009-10-19 23:09:18

solution2
2 2009-10-19 23:09:30

solution3
2 ACCPTED 2009-10-19 23:14:18

solution4
1 2009-10-19 23:10:03

Java Internationalization

Question

4 answers

solution1 6 2009-10-19 23:09:18

solution2 2 2009-10-19 23:09:30

solution3 2 ACCPTED 2009-10-19 23:14:18

solution4 1 2009-10-19 23:10:03

solution1
6 2009-10-19 23:09:18

solution2
2 2009-10-19 23:09:30

solution3
2 ACCPTED 2009-10-19 23:14:18

solution4
1 2009-10-19 23:10:03