简体   繁体   English

如何在 Java 中将 UTF-16 代理十进制转换为 UNICODE

[英]How to Convert UTF-16 Surrogate Decimal to UNICODE in Java

I have some string data like我有一些字符串数据,如

&#55357 ;&#56842 ; &#55357 ;&#56842 ;

These are surrogate pairs in UTF 16 in decimal format.这些是十进制格式的 UTF 16 代理对。

How can I convert them to Unicode Code Points in Java, so that my client can understand the Unicode decimal html entity without the surrogate pair?如何在 Java 中将它们转换为 Unicode 代码点,以便我的客户可以在没有代理对的情况下理解 Unicode 十进制 html 实体?

Example: &#128522 ;示例: &#128522 ; - Get this response for the above string - 获取上述字符串的响应

Assuming you already parsed the string to get the 2 numbers, just create a String from those two char values:假设您已经解析了字符串以获取 2 个数字,只需从这两个char值创建一个字符串:

String s = new String(new char[] { 55357, 56842 });
System.out.println(s);

Output输出

😊

To get the code point of that:要获得它的代码点:

s.codePointAt(0) // returns 128522

You don't have to create a string though:不过,您不必创建字符串:

Character.toCodePoint((char) 55357, (char) 56842) // returns 128522

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM