Truncate unicode string to max bytes

Question

I need to truncate a (possibly large) unicode string to a max size in bytes. Converting to UTF-16 and then back appears unreliable.

For example:

let flags = "🇵🇷🇵🇷"
let result = String(flags.utf16.prefix(3))

In this case result is nil.

I need an efficient way to perform this truncation. Ideas?

Answer 1

String in Swift goes by UnicodeScalar and each scalar can take multiple bytes to store. If you just take the first n bytes no matter what, chances are that these bytes will not form a correct substring in any encoding when you convert them back.

Now if you change the definition to "take up to the first n bytes that can form a valid substring", you can use the UTF8View :

extension String {
    func firstBytes(_ count: Int) -> UTF8View {
        guard count > 0 else { return self.utf8.prefix(0) }

        var actualByteCount = count
        while actualByteCount > 0 {
            let subview = self.utf8.prefix(actualByteCount)
            if let _ = String(subview) {
                return subview
            } else {
                actualByteCount -= 1
            }
        }

        return self.utf8.prefix(0)
    }
}

let flags = "welcome to 🇵🇷 and 🇺🇸"

let bytes1 = flags.firstBytes(11)

// the Puerto Rico flag character take 8 bytes to store
// so the actual number of bytes returned is 11, same as bytes1
let bytes2 = flags.firstBytes(13)

// now you can cover the string up to the Puerto Rico flag 
let bytes3 = flags.firstBytes(19)

print("'\(bytes1)'")
print("'\(bytes2)'")
print("'\(bytes3)'")

Truncate unicode string to max bytes

Question

1 answers

solution1
0 2017-05-30 18:35:05

Truncate unicode string to max bytes

Question

1 answers

solution1 0 2017-05-30 18:35:05

solution1
0 2017-05-30 18:35:05