简体   繁体   English

Swift String.Index vs将String转换为数组

[英]Swift String.Index vs transforming the String to an Array

In the swift doc, they say they use String.Index to index strings, as different characters can take a different amount of memory. 在swift doc中,他们说他们使用String.Index来索引字符串,因为不同的字符可以占用不同的内存量。

But I saw a lot of people transforming a String into an array var a = Array(s) so they can index by int instead of String.Index (which is definitely easier) 但我看到很多人将String转换为数组var a = Array(s)因此它们可以通过int而不是String.Index进行索引(这肯定更容易)

So I wanted to test by myself if it's exactly the same for all unicode character: 所以我想自己测试一下所有unicode字符是否完全相同:

let cafeA = "caf\u{E9}" // eAcute
let cafeB = "caf\u{65}\u{301}" // combinedEAcute

let arrayCafeA = Array(cafeA)
let arrayCafeB = Array(cafeB)

print("\(cafeA) is \(cafeA.count) character \(arrayCafeA.count)")
print("\(cafeB) is \(cafeB.count) character \(arrayCafeB.count)")
print(cafeA == cafeB)

print("- A scalar")
for scalar in cafeA.unicodeScalars {
    print(scalar.value)
}
print("- B scalar")
for scalar in cafeB.unicodeScalars {
    print(scalar.value)
}

And here is the output : 这是输出:

café is 4 character 4
café is 4 character 4
true
- A scalar
99
97
102
233
- B scalar
99
97
102
101
769

And sure enough, as mentioned in the doc strings are just an array of Character, and then the grapheme cluster is down within the Character object, so why don't they indexed it by int ? 当然,正如doc字符串中提到的只是一个Character数组,然后字形集群在Character对象中,所以为什么不用int对它进行索引? what's the point of creating/using String.Index actually ? 实际创建/使用String.Index有什么意义?

In a String, the byte representation is packed, so there's no way to know where the character boundaries are without traversing the whole string from the start. 在String中,字节表示是打包的,因此无法从一开始就知道字符边界的位置而不遍历整个字符串。

When converting to an array, this is traversal is done once, and the result is an array of characters that are equidistantly spaced out in memory, which is what allows constant time subscripting by an Int index. 转换为数组时,这是遍历完成一次,结果是在内存中等距间隔的字符数组,这允许通过Int索引进行常量时间下标。 Importantly, the array is preserved, so many subscripting operations can be done upon the same array, requiring only one traversal of the String's bytes, for the initial unpacking. 重要的是,数组被保留,因此许多下标操作可以在同一个数组上完成,只需要遍历String的字节,用于初始解包。

It is possible extend String with a subscript that indexes it by an Int , and you see it often come up on SO, but that's ill advised. 有可能使用下标来扩展String,并使用Int对其进行索引,并且您看到它经常出现在SO上,但这是不明智的。 The standard library programmers could have added it, but they purposely chose not to, because it obscures the fact that every indexing operation requires a separate traversal of the String's bytes, which is O(string.count) . 标准库程序员可以添加它,但是他们故意选择不这样做,因为它模糊了每个索引操作都需要单独遍历String的字节,即O(string.count)这一事实。 All of a sudden, innocuous code like this: 突然间,这样无害的代码:

for i in string.indices {
    print(string[i]) // Looks O(1), but is actually O(string.count)!
}

becomes quadratic. 变成二次方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM