简体   繁体   中英

Swift 3: how to convert a UTF8 data stream (1,2,3 or 4 bytes per char) to String?

In my app a tcp client is handling a data stream coming from a remote tcp server. Everything works fine while the received characters are 1-byte characters. When the tcp server sends special characters such "ü" (hex "c3b5" -> a 2-byte character), I start to experience issues.

This is the Swift 3 line of code that gets a nil String whenever the received data include some UTF8 characters with more than 1 byte:

let convertedString = String(bytes: data, encoding: String.Encoding.utf8)

Any idea about how could I fix this? Basically the incoming stream could include 1-byte or 2-byte characters encoded as UTF8 and I need to convert the data stream into a String without issues.

Here is the whole portion of code where I'm experiencing the issue:

func startRead(for task: URLSessionStreamTask) {
    task.readData(ofMinLength: 1, maxLength: 65535, timeout: 300) { (data, eof, error) in
        if let data = data {
            NSLog("stream task read %@", data as NSData)

            let convertedString1 = String(data: data, encoding: String.Encoding(rawValue: String.Encoding.utf8.rawValue))

            if let convertedString = String(bytes: data, encoding: String.Encoding.utf8) {

                self.partialMessage = self.partialMessage + convertedString

                NSLog(convertedString)

                // Assign lengths (delimiter, MD5 digest, minimum expected length, message length)
                let delimiterLength = Constants.END_OF_MESSAGE_DELIMITER.lengthOfBytes(using: String.Encoding.utf8)
                let MD5Length = 32 // 32 characters -> hex representation of 16 bytes
                // 3 = CR+LF+1 char at least
                let minimumExpectedMessageLength = MD5Length + delimiterLength + 3
                let messageLength = self.partialMessage.lengthOfBytes(using: String.Encoding.utf8)

                // Check for delimiter and minimum expected message length (2 char msg + MD5 digest + delimiter)
                if (self.partialMessage.contains(Constants.END_OF_MESSAGE_DELIMITER)) &&
                    (messageLength >= minimumExpectedMessageLength) {

                    var message = self.partialMessage

                    // Get rid of optional CR+LF
                    var lowBound = message.index(message.endIndex, offsetBy: -1)
                    var hiBound = message.index(message.endIndex, offsetBy: 0)
                    var midRange = lowBound ..< hiBound

                    let optionalCRLF = message.substring(with: midRange)

                    if (optionalCRLF == "\r\n") || (optionalCRLF == "\0") {  // Remove CR+LF if present
                        lowBound = message.index(message.endIndex, offsetBy: -1)
                        hiBound = message.index(message.endIndex, offsetBy: 0)
                        midRange = lowBound ..< hiBound
                        message.removeSubrange(midRange)
                    }

                    // Check for delimiter proper position (has to be at the end)
                    lowBound = message.index(message.endIndex, offsetBy: -delimiterLength)
                    hiBound = message.index(message.endIndex, offsetBy: 0)
                    midRange = lowBound ..< hiBound

                    let delimiter = message.substring(with: midRange)

                    if (delimiter == Constants.END_OF_MESSAGE_DELIMITER)  // Delimiter in proper position?
                    {
                        // Acquire the MD digest
                        lowBound = message.index(message.endIndex, offsetBy: -(MD5Length+delimiterLength))
                        hiBound = message.index(message.endIndex, offsetBy: -(delimiterLength))
                        midRange = lowBound ..< hiBound
                        let receivedMD5 = message.substring(with: midRange)

                        // Acquire the deframed message (normalized message)
                        lowBound = message.index(message.startIndex, offsetBy: 0)
                        hiBound = message.index(message.endIndex, offsetBy: -(MD5Length+delimiterLength))
                        midRange = lowBound ..< hiBound
                        let normalizedMessage = message.substring(with: midRange)

                        // Calculate the MD5 digest on the normalized message
                        let calculatedMD5Digest = normalizedMessage.md5()

                        // Debug
                        print(delimiter)
                        print(normalizedMessage)
                        print(receivedMD5)
                        print(calculatedMD5Digest!)

                        // Check for the integrity of the data
                        if (receivedMD5.lowercased() == calculatedMD5Digest?.lowercased()) || self.noMD5Check  // TEMPORARY
                        {
                            if (normalizedMessage == "Unauthorized Access")
                            {
                                // Update the authorization status
                                self.authorized = false

                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Unauthorized Access", comment: "Unauthorized Access Title"), message: NSLocalizedString("Please login with the proper Username and Password before to send any command!", comment: "Unauthorized Access Message"))                                    
                            }
                            else if (normalizedMessage == "System Busy")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("System Busy", comment: "System Busy Title"), message: NSLocalizedString("The system is busy at the moment. Only one connection at a time is allowed!", comment: "System Busy Message"))
                            }
                            else if (normalizedMessage == "Error")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("An error occurred during the execution of the command!", comment: "Command Error Message"))
                            }
                            else if (normalizedMessage == "ErrorMachineRunning")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("The command cannot be executed while the machine is running", comment: "Machine Running Message 1")+"!\r\n\n "+NSLocalizedString("Trying to execute any command in this state could be dangerous for both people and machinery", comment: "Machine Running Message 2")+".\r\n\n "+NSLocalizedString("Please stop the machine and leave the automatic or semi-automatic modes before to provide any command", comment: "Machine Running Message 3")+".")
                            }
                            else if (normalizedMessage == "Command Not Recognized")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("Command not recognized!", comment: "Command Unrecognized Message"))
                            }
                            else
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                //let testMessage = "test\r\nf3ea0b9bff4a2c79e60acf6873f4a1ce</EOM>\r\n"
                                //normalizedMessage = testMessage

                                // Process the received csv file
                                self.processCsvData(file: normalizedMessage)
                            }
                        }
                        else
                        {
                            // Stop the refresh control
                            if let refreshControl = self.refreshControl {
                                if refreshControl.isRefreshing {
                                    refreshControl.endRefreshing()
                                }
                            }

                            // Stop the stream
                            NSLog("stream task stop")
                            self.stop(task: task)

                            // Shows an alert
                            self.showAlert(title: NSLocalizedString("Data Error", comment: "Data Error Title"), message: NSLocalizedString("The received data cannot be read since it's corrupted or incomplete!", comment: "Data Error Message"))
                        }

                    }
                    else
                    {
                        // Stop the refresh control
                        if let refreshControl = self.refreshControl {
                            if refreshControl.isRefreshing {
                                refreshControl.endRefreshing()
                            }
                        }

                        // Stop the stream
                        NSLog("stream task stop")
                        self.stop(task: task)

                        // Shows an alert
                        self.showAlert(title: NSLocalizedString("Data Error", comment: "Data Error Title"), message: NSLocalizedString("The received data cannot be read since it's corrupted or incomplete!", comment: "Data Error Message"))
                    }
                }
            }
        }
        if eof {
            // Stop the refresh control
            if let refreshControl = self.refreshControl {
                if refreshControl.isRefreshing {
                    refreshControl.endRefreshing()
                }
            }

            // Refresh the tableview content
            self.tableView.reloadData()

            // Stop the stream
            NSLog("stream task end")
            self.stop(task: task)

        } else if error == nil {
            self.startRead(for: task)
        } else {
            // We ignore the error because we'll see it again in `didCompleteWithError`.
            NSLog("stream task read error")
        }
    }
}

It's critical that data represents the data for the entire string, not just a substring. If you are attempting to convert substrings from partial data of the entire string, it will fail in many cases.

It works with 1-byte characters because no matter where you chop the data stream, the partial data still represents a valid string. But once you start dealing with multi-byte characters, a partial data stream could easily result in the first or last byte of the data being only part of a multi-byte character. This prevents the data from being interpreted properly.

So you must ensure that you build up a data object with all of the bytes of a given string before attempting to convert the data into a string.

Normally you should start your data with a byte count. Say the first 4 bytes represent a 32-bit integer in some agreed upon "endianness". You read those 4 bytes to get the length. Then you read data until you get that many more bytes. Then you know you are at the end of the message.

The problem with trying to use an "end of message" marker at the end of your data is that the "end of message" marker could be split across reads. Either way, you need to refactor your code to process at the data level and not make any attempt to convert the data to a string until all of the string data is read.

As you know, single UTF-8 character is either in 1, 2, 3 or 4 bytes. For your case, you need to handle 1 or 2 byte characters. And your receiving byte sequence may not be aligned to "character boundary". However, as rmaddy pointed, the byte sequence to String.Encoding.utf8 must start and end with right boundary.

Now, there are two options to handle this situation. One is, as rmaddy suggests, to send length at first and count incoming data bytes. The drawback of this is that you have to modify transmit (server) side as well, which may not be possible.

Another option is to scan incoming sequence byte-by-byte and keep track the character boundary, then build up legitimate UTF-8 byte sequence. Fortunately, UTF-8 is designed so that you can easily identify where the character boundary is by seeing ANY byte in byte stream. Specifically, first byte of 1, 2, 3 and 4 byte UTF-8 character starts with 0xxxxxxx, 110xxxxx, 1110xxxx and 11110xxx respectively, and second..fourth bytes are all in 10xxxxxx in bit representation. This makes your life a lot easier.

If you pick up your "end of message" marker from one of 1 byte UTF-8 characters, you can easily and successfully detect EOM w/o considering byte sequence since it's a single byte and doesn't appear anywhere in 2..4 byte chars.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM