简体   繁体   English

如何在Swift中将代理项​​对转换为Unicode标量

[英]How to convert surrogate pair to Unicode scalar in Swift

The following example is taken from the Strings and Characters documentation : 以下示例取自字符串和字符文档

在此输入图像描述

The values 55357 ( U+D83D in hex) and 56374 ( U+DC36 in hex) are the surrogate pairs that form the Unicode scalar U+1F436 , which is the DOG FACE character. 55357U+D83D ,十六进制)和56374U+DC36 ,十六进制)是形成Unicode标量U+1F436的代理对,即DOG FACE字符。 Is there any way to go the other direction? 有没有办法走向另一个方向? That is, can I convert a surrogate pair into a scalar? 也就是说,我可以将代理对转换为标量吗?

I tried 我试过了

let myChar: Character = "\u{D83D}\u{DC36}"

but I got an "Invalid Unicode scalar" error. 但是我收到了“无效的Unicode标量”错误。

This Objective C answer and this project seem to be custom solutions, but is there anything built into Swift (especially Swift 2.0+) that does this? 这个Objective C答案这个项目似乎是自定义解决方案,但Swift(尤其是Swift 2.0+)中是否有任何内容可以做到这一点?

There are formulas to calculate the original code point based on a surrogate pair and vice versa. 存在基于代理对计算原始代码点的公式,反之亦然。 From https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae : 来自https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae

Section 3.7 of The Unicode Standard 3.0 defines the algorithms for converting to and from surrogate pairs. Unicode标准3.0的3.7节定义了转换代理对和从代理对转换的算法。

A code point C greater than 0xFFFF corresponds to a surrogate pair <H, L> as per the following formula: 大于0xFFFF代码点C对应于代理对<H, L> ,如下式所示:

 H = Math.floor((C - 0x10000) / 0x400) + 0xD800 L = (C - 0x10000) % 0x400 + 0xDC00 

The reverse mapping, ie from a surrogate pair <H, L> to a Unicode code point C , is given by: 反向映射,即从代理对<H, L>到Unicode代码点C ,由下式给出:

 C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000 

Given an sequence of UTF-16 code units (ie 16-bit numbers, such as you get from String.utf16 or just an array of numbers), you can use the UTF16 type and its decode method to turn it into UnicodeScalars , which you can then convert into a String . 给定一系列UTF-16代码单元(即16位数字,例如你从String.utf16获得或只是一个数组),你可以使用UTF16类型及其decode方法将其转换为UnicodeScalars ,你然后可以转换为String

It's a bit of a grungy item, that takes a generator (as it does stateful processing) and returns an enum that indicates a result (with an associated type of the scalar), or an error or completion. 这是一个有趣的项目,它接受一个生成器(因为它执行有状态处理)并返回一个枚举,指示结果(与标量的关联类型),或错误或完成。 Swift 2.0 pattern matching makes it a lot easier to use: Swift 2.0模式匹配使它更容易使用:

let u16data: [UInt16] = [0xD83D,0xDC36]
//or let u16data = "Hello, 🌍".utf16

var g = u16data.generate()
var s: String = ""
var utf16 = UTF16()
while case let .Result(scalar) = utf16.decode(&g) {
    print(scalar, &s)
}
print(s) // prints 🐶

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM