简体   繁体   English

非 BMP 平面字符的 Unicode 转义序列

[英]Unicode escape sequence for non-BMP plane character

["

In java,<\/i>在java中,<\/b><\/p>

Unicode characters can be represented using unicode escape sequence for UTF-16<\/code> encoding.<\/i> Unicode 字符可以使用UTF-16<\/code>编码的 unicode 转义序列来表示。<\/b> Below is an example that represents BMP plane character<\/i>下面是一个代表 BMP 平面字符的例子<\/b><\/p>

char ch = '\u00A5'; // '¥'

You cannot do that with a single char constant, since a char is a UTF-16 code unit.您不能使用单个 char 常量来执行此操作,因为 char 是 UTF-16 代码单元。 You have to use a String constant, such as:您必须使用 String 常量,例如:

final String s = "\uXXXX\uYYYY";

where XXXX is the high surrogate and YYYY is the low surrogate.其中XXXX是高代理, YYYY是低代理。

Another solution is to use an int to store the code point;另一种解决方案是使用int来存储代码点; you can then use Character.toChars() to obtain a char[] out of it:然后,您可以使用Character.toChars()从中获取char[]

final int codePoint = 0x1f4ae; // for instance
final char[] toChars = Charater.toChars(codePoint);

Depending on what you use, you may also append code points directly (a StringBuilder has a method for that, for instance).根据您使用的内容,您还可以直接附加代码点(例如, StringBuilder有一个方法)。

To avoid writing surrogates pair for non-BMP chars and obtaining a String from a code point there are several methods.为了避免为非 BMP 字符编写代理项对并从代码点获取字符串,有几种方法。

String test1 = new String(new int[] { 0x1f4ae }, 0, 1);
String test2 = String.valueOf(Character.toChars(0x1f4ae));
String test3 = Character.toString(0x1f4ae):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM