简体   繁体   中英

Can Character represent all unicode code point?

Since Java char is 16 bit long, I am wondering how can it represent the full unicode code point? It can only represent 65536 code points, is that right?

Yes, a Java char is a UTF-16 code unit. If you need to represent Unicode characters outside the Basic Multilingual Plane, you need to use surrogate pairs within a java.lang.String . The String class provides various methods to work with full Unicode code points, such as codePointAt(index) .

From section 3.1 of the Java Language Specification :

The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding. A few APIs, primarily in the Character class, use 32-bit integers to represent code points as individual entities. The Java platform provides methods to convert between the two representations.

See the Character docs for more information.

One char , which is unsigned 16 bits, can represent any code point up to 0xFFFF, but not supplemental characters, which are larger. Java is best thought of as using UTF-16 encoding in char , so, supplemental characters are actually represented as pairs of char , a surrogate pair. While one char can't represent such supplemental characters, Java does handle it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM