[英]Unicode escape sequence for non-BMP plane character
In java,<\/i>在java中,<\/b><\/p>
Unicode characters can be represented using unicode escape sequence for
UTF-16<\/code> encoding.<\/i> Unicode 字符可以使用
UTF-16<\/code>编码的 unicode 转义序列来表示。<\/b> Below is an example that represents BMP plane character<\/i>下面是一个代表 BMP 平面字符的例子<\/b><\/p>
char ch = '\u00A5'; // '¥'
You cannot do that with a single char constant, since a char is a UTF-16 code unit.您不能使用单个 char 常量来执行此操作,因为 char 是 UTF-16 代码单元。 You have to use a String constant, such as:
您必须使用 String 常量,例如:
final String s = "\uXXXX\uYYYY";
where XXXX
is the high surrogate and YYYY
is the low surrogate.其中
XXXX
是高代理, YYYY
是低代理。
Another solution is to use an int
to store the code point;另一种解决方案是使用
int
来存储代码点; you can then use Character.toChars()
to obtain a char[]
out of it:然后,您可以使用
Character.toChars()
从中获取char[]
:
final int codePoint = 0x1f4ae; // for instance
final char[] toChars = Charater.toChars(codePoint);
Depending on what you use, you may also append code points directly (a StringBuilder
has a method for that, for instance).根据您使用的内容,您还可以直接附加代码点(例如,
StringBuilder
有一个方法)。
To avoid writing surrogates pair for non-BMP chars and obtaining a String from a code point there are several methods.为了避免为非 BMP 字符编写代理项对并从代码点获取字符串,有几种方法。
String test1 = new String(new int[] { 0x1f4ae }, 0, 1);
String test2 = String.valueOf(Character.toChars(0x1f4ae));
String test3 = Character.toString(0x1f4ae):
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.