简体   繁体   中英

Swift String.Index vs transforming the String to an Array

In the swift doc, they say they use String.Index to index strings, as different characters can take a different amount of memory.

But I saw a lot of people transforming a String into an array var a = Array(s) so they can index by int instead of String.Index (which is definitely easier)

So I wanted to test by myself if it's exactly the same for all unicode character:

let cafeA = "caf\u{E9}" // eAcute
let cafeB = "caf\u{65}\u{301}" // combinedEAcute

let arrayCafeA = Array(cafeA)
let arrayCafeB = Array(cafeB)

print("\(cafeA) is \(cafeA.count) character \(arrayCafeA.count)")
print("\(cafeB) is \(cafeB.count) character \(arrayCafeB.count)")
print(cafeA == cafeB)

print("- A scalar")
for scalar in cafeA.unicodeScalars {
    print(scalar.value)
}
print("- B scalar")
for scalar in cafeB.unicodeScalars {
    print(scalar.value)
}

And here is the output :

café is 4 character 4
café is 4 character 4
true
- A scalar
99
97
102
233
- B scalar
99
97
102
101
769

And sure enough, as mentioned in the doc strings are just an array of Character, and then the grapheme cluster is down within the Character object, so why don't they indexed it by int ? what's the point of creating/using String.Index actually ?

In a String, the byte representation is packed, so there's no way to know where the character boundaries are without traversing the whole string from the start.

When converting to an array, this is traversal is done once, and the result is an array of characters that are equidistantly spaced out in memory, which is what allows constant time subscripting by an Int index. Importantly, the array is preserved, so many subscripting operations can be done upon the same array, requiring only one traversal of the String's bytes, for the initial unpacking.

It is possible extend String with a subscript that indexes it by an Int , and you see it often come up on SO, but that's ill advised. The standard library programmers could have added it, but they purposely chose not to, because it obscures the fact that every indexing operation requires a separate traversal of the String's bytes, which is O(string.count) . All of a sudden, innocuous code like this:

for i in string.indices {
    print(string[i]) // Looks O(1), but is actually O(string.count)!
}

becomes quadratic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM