非 BMP 平面字符的 Unicode 转义序列

Question

["

In java,<\/i>在java中，<\/b><\/p>

Unicode characters can be represented using unicode escape sequence for UTF-16<\/code> encoding.<\/i> Unicode 字符可以使用UTF-16<\/code>编码的 unicode 转义序列来表示。<\/b> Below is an example that represents BMP plane character<\/i>下面是一个代表 BMP 平面字符的例子<\/b><\/p>

char ch = '\u00A5'; // '¥'

Answer 1

You cannot do that with a single char constant, since a char is a UTF-16 code unit.您不能使用单个 char 常量来执行此操作，因为 char 是 UTF-16 代码单元。 You have to use a String constant, such as:您必须使用 String 常量，例如：

final String s = "\uXXXX\uYYYY";

where XXXX is the high surrogate and YYYY is the low surrogate.其中XXXX是高代理， YYYY是低代理。

Another solution is to use an int to store the code point;另一种解决方案是使用int来存储代码点； you can then use Character.toChars() to obtain a char[] out of it:然后，您可以使用Character.toChars()从中获取char[] ：

final int codePoint = 0x1f4ae; // for instance
final char[] toChars = Charater.toChars(codePoint);

Depending on what you use, you may also append code points directly (a StringBuilder has a method for that, for instance).根据您使用的内容，您还可以直接附加代码点（例如， StringBuilder有一个方法）。

Answer 2

To avoid writing surrogates pair for non-BMP chars and obtaining a String from a code point there are several methods.为了避免为非 BMP 字符编写代理项对并从代码点获取字符串，有几种方法。

String test1 = new String(new int[] { 0x1f4ae }, 0, 1);
String test2 = String.valueOf(Character.toChars(0x1f4ae));
String test3 = Character.toString(0x1f4ae):

非 BMP 平面字符的 Unicode 转义序列

问题描述

2 个解决方案

解决方案1
8 已采纳 2015-08-05 08:45:29

解决方案2
0 2022-05-27 10:42:00

非 BMP 平面字符的 Unicode 转义序列

问题描述

2 个解决方案

解决方案1 8 已采纳 2015-08-05 08:45:29

解决方案2 0 2022-05-27 10:42:00

解决方案1
8 已采纳 2015-08-05 08:45:29

解决方案2
0 2022-05-27 10:42:00