I have a Java string that I'm having trouble manipulating. I have a String, s, that has a value of 丞 (a Chinese character I chose at random, I don't speak Chinese). If I call
String t = new String(s.getBytes());
if (s.equals(t))
System.out.println("String unchanged");
else
System.out.println("String changed");
Then I get the String changed result. Does anyone know what's going on?
Because that method :
Encodes this String into a sequence of bytes using the platform's default charset
If your default charset is ie US-ASCII
you won't get the same bytes used by that Chinese letter
I imagine an extra bit/byte may be added/droppped in the process.
Try using getBytes( String charSetName )
public byte[] getBytes(String charsetName)
Using the correct charsetName
The getBytes() method uses the default encoding. According to the docs:
The CharsetEncoder class should be used when more control over the encoding process is required.
Actually, I figured this out, sorry for the post. I was using the default Java Charset, instead of explicitly casting it as a UTF-8 Charset. It works now.
String t = new String(s.getBytes()); may create string using ASCII as default charset. Use following method to create the string with charsetName as UTF-8
String(byte[] bytes, int offset, int length, String charsetName)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.