I am trying to deal with an encoding problem (I want to transform the special characters from a string into correct UTF-8 characters...):
When I execute this simple code:
System.out.println(new String("é".getBytes("UTF-8"), "UTF-8"));
In the console I expect: 'é' but I get
é
é
is the HTML entity reference for the é
character, not the UTF-8 encoded string. To decode it, you can use Commons Lang's org.apache.commons.lang.StringEscapeUtils
:
String decodedStr = StringEscapeUtils.unescapeHtml("é");
Java Strings know nothing of SGML / XML / HTML5 entities. é
is such an entity. It works in web browsers inside HTML because in one of the DTDs, or the HTML5 spec, it's defined that é
is the letter e with accent acute by mapping it to the corresponding unicode character entity é
.
new String(someString.getBytes("UTF-8"), "UTF-8");
is a meaningless operation, it converts a String into bytes, with an encoding that can represent all meaningful characters, and converts it back into a String. It's the same thing as using someString
directly, just you have a new object.
In order to get e with accent acute, you can do one of the following things:
System.out.println("é");
. This requires that your text editor and your Java compiler agree on the encoding of the source code file. If you're working in a project, it requires that everybody understands and agrees on a particular encoding. Recommended encoding these days certainly is UTF-8. \é
. PS: SGML / XML / HTML5 entities have nothing to do with UTF-8.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.