I need to truncate a (possibly large) unicode string to a max size in bytes. Converting to UTF-16 and then back appears unreliable.
For example:
let flags = "🇵🇷🇵🇷"
let result = String(flags.utf16.prefix(3))
In this case result is nil.
I need an efficient way to perform this truncation. Ideas?
String in Swift goes by UnicodeScalar
and each scalar can take multiple bytes to store. If you just take the first n
bytes no matter what, chances are that these bytes will not form a correct substring in any encoding when you convert them back.
Now if you change the definition to "take up to the first n
bytes that can form a valid substring", you can use the UTF8View
:
extension String {
func firstBytes(_ count: Int) -> UTF8View {
guard count > 0 else { return self.utf8.prefix(0) }
var actualByteCount = count
while actualByteCount > 0 {
let subview = self.utf8.prefix(actualByteCount)
if let _ = String(subview) {
return subview
} else {
actualByteCount -= 1
}
}
return self.utf8.prefix(0)
}
}
let flags = "welcome to 🇵🇷 and 🇺🇸"
let bytes1 = flags.firstBytes(11)
// the Puerto Rico flag character take 8 bytes to store
// so the actual number of bytes returned is 11, same as bytes1
let bytes2 = flags.firstBytes(13)
// now you can cover the string up to the Puerto Rico flag
let bytes3 = flags.firstBytes(19)
print("'\(bytes1)'")
print("'\(bytes2)'")
print("'\(bytes3)'")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.