简体   繁体   English

Swift 3:如何将UTF8数据流(每个字符1,2、3或4个字节)转换为String?

[英]Swift 3: how to convert a UTF8 data stream (1,2,3 or 4 bytes per char) to String?

In my app a tcp client is handling a data stream coming from a remote tcp server. 在我的应用中,tcp客户端正在处理来自远程tcp服务器的数据流。 Everything works fine while the received characters are 1-byte characters. 接收的字符为1字节字符时,一切正常。 When the tcp server sends special characters such "ü" (hex "c3b5" -> a 2-byte character), I start to experience issues. 当tcp服务器发送诸如“ü”(十六进制“ c3b5”-> 2字节字符)之类的特殊字符时,我开始遇到问题。

This is the Swift 3 line of code that gets a nil String whenever the received data include some UTF8 characters with more than 1 byte: 这是Swift 3的代码行,每当接收到的数据包含一些大于1个字节的UTF8字符时,该代码都会获得nil String:

let convertedString = String(bytes: data, encoding: String.Encoding.utf8)

Any idea about how could I fix this? 关于如何解决此问题的任何想法? Basically the incoming stream could include 1-byte or 2-byte characters encoded as UTF8 and I need to convert the data stream into a String without issues. 基本上,传入流可以包含编码为UTF8的1字节或2字节字符,我需要将数据流转换为String而不出现问题。

Here is the whole portion of code where I'm experiencing the issue: 这是我遇到问题的整个代码部分:

func startRead(for task: URLSessionStreamTask) {
    task.readData(ofMinLength: 1, maxLength: 65535, timeout: 300) { (data, eof, error) in
        if let data = data {
            NSLog("stream task read %@", data as NSData)

            let convertedString1 = String(data: data, encoding: String.Encoding(rawValue: String.Encoding.utf8.rawValue))

            if let convertedString = String(bytes: data, encoding: String.Encoding.utf8) {

                self.partialMessage = self.partialMessage + convertedString

                NSLog(convertedString)

                // Assign lengths (delimiter, MD5 digest, minimum expected length, message length)
                let delimiterLength = Constants.END_OF_MESSAGE_DELIMITER.lengthOfBytes(using: String.Encoding.utf8)
                let MD5Length = 32 // 32 characters -> hex representation of 16 bytes
                // 3 = CR+LF+1 char at least
                let minimumExpectedMessageLength = MD5Length + delimiterLength + 3
                let messageLength = self.partialMessage.lengthOfBytes(using: String.Encoding.utf8)

                // Check for delimiter and minimum expected message length (2 char msg + MD5 digest + delimiter)
                if (self.partialMessage.contains(Constants.END_OF_MESSAGE_DELIMITER)) &&
                    (messageLength >= minimumExpectedMessageLength) {

                    var message = self.partialMessage

                    // Get rid of optional CR+LF
                    var lowBound = message.index(message.endIndex, offsetBy: -1)
                    var hiBound = message.index(message.endIndex, offsetBy: 0)
                    var midRange = lowBound ..< hiBound

                    let optionalCRLF = message.substring(with: midRange)

                    if (optionalCRLF == "\r\n") || (optionalCRLF == "\0") {  // Remove CR+LF if present
                        lowBound = message.index(message.endIndex, offsetBy: -1)
                        hiBound = message.index(message.endIndex, offsetBy: 0)
                        midRange = lowBound ..< hiBound
                        message.removeSubrange(midRange)
                    }

                    // Check for delimiter proper position (has to be at the end)
                    lowBound = message.index(message.endIndex, offsetBy: -delimiterLength)
                    hiBound = message.index(message.endIndex, offsetBy: 0)
                    midRange = lowBound ..< hiBound

                    let delimiter = message.substring(with: midRange)

                    if (delimiter == Constants.END_OF_MESSAGE_DELIMITER)  // Delimiter in proper position?
                    {
                        // Acquire the MD digest
                        lowBound = message.index(message.endIndex, offsetBy: -(MD5Length+delimiterLength))
                        hiBound = message.index(message.endIndex, offsetBy: -(delimiterLength))
                        midRange = lowBound ..< hiBound
                        let receivedMD5 = message.substring(with: midRange)

                        // Acquire the deframed message (normalized message)
                        lowBound = message.index(message.startIndex, offsetBy: 0)
                        hiBound = message.index(message.endIndex, offsetBy: -(MD5Length+delimiterLength))
                        midRange = lowBound ..< hiBound
                        let normalizedMessage = message.substring(with: midRange)

                        // Calculate the MD5 digest on the normalized message
                        let calculatedMD5Digest = normalizedMessage.md5()

                        // Debug
                        print(delimiter)
                        print(normalizedMessage)
                        print(receivedMD5)
                        print(calculatedMD5Digest!)

                        // Check for the integrity of the data
                        if (receivedMD5.lowercased() == calculatedMD5Digest?.lowercased()) || self.noMD5Check  // TEMPORARY
                        {
                            if (normalizedMessage == "Unauthorized Access")
                            {
                                // Update the authorization status
                                self.authorized = false

                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Unauthorized Access", comment: "Unauthorized Access Title"), message: NSLocalizedString("Please login with the proper Username and Password before to send any command!", comment: "Unauthorized Access Message"))                                    
                            }
                            else if (normalizedMessage == "System Busy")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("System Busy", comment: "System Busy Title"), message: NSLocalizedString("The system is busy at the moment. Only one connection at a time is allowed!", comment: "System Busy Message"))
                            }
                            else if (normalizedMessage == "Error")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("An error occurred during the execution of the command!", comment: "Command Error Message"))
                            }
                            else if (normalizedMessage == "ErrorMachineRunning")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("The command cannot be executed while the machine is running", comment: "Machine Running Message 1")+"!\r\n\n "+NSLocalizedString("Trying to execute any command in this state could be dangerous for both people and machinery", comment: "Machine Running Message 2")+".\r\n\n "+NSLocalizedString("Please stop the machine and leave the automatic or semi-automatic modes before to provide any command", comment: "Machine Running Message 3")+".")
                            }
                            else if (normalizedMessage == "Command Not Recognized")
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                // Shows an alert
                                self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("Command not recognized!", comment: "Command Unrecognized Message"))
                            }
                            else
                            {
                                // Stop the refresh control
                                if let refreshControl = self.refreshControl {
                                    if refreshControl.isRefreshing {
                                        refreshControl.endRefreshing()
                                    }
                                }

                                // Stop the stream
                                NSLog("stream task stop")
                                self.stop(task: task)

                                //let testMessage = "test\r\nf3ea0b9bff4a2c79e60acf6873f4a1ce</EOM>\r\n"
                                //normalizedMessage = testMessage

                                // Process the received csv file
                                self.processCsvData(file: normalizedMessage)
                            }
                        }
                        else
                        {
                            // Stop the refresh control
                            if let refreshControl = self.refreshControl {
                                if refreshControl.isRefreshing {
                                    refreshControl.endRefreshing()
                                }
                            }

                            // Stop the stream
                            NSLog("stream task stop")
                            self.stop(task: task)

                            // Shows an alert
                            self.showAlert(title: NSLocalizedString("Data Error", comment: "Data Error Title"), message: NSLocalizedString("The received data cannot be read since it's corrupted or incomplete!", comment: "Data Error Message"))
                        }

                    }
                    else
                    {
                        // Stop the refresh control
                        if let refreshControl = self.refreshControl {
                            if refreshControl.isRefreshing {
                                refreshControl.endRefreshing()
                            }
                        }

                        // Stop the stream
                        NSLog("stream task stop")
                        self.stop(task: task)

                        // Shows an alert
                        self.showAlert(title: NSLocalizedString("Data Error", comment: "Data Error Title"), message: NSLocalizedString("The received data cannot be read since it's corrupted or incomplete!", comment: "Data Error Message"))
                    }
                }
            }
        }
        if eof {
            // Stop the refresh control
            if let refreshControl = self.refreshControl {
                if refreshControl.isRefreshing {
                    refreshControl.endRefreshing()
                }
            }

            // Refresh the tableview content
            self.tableView.reloadData()

            // Stop the stream
            NSLog("stream task end")
            self.stop(task: task)

        } else if error == nil {
            self.startRead(for: task)
        } else {
            // We ignore the error because we'll see it again in `didCompleteWithError`.
            NSLog("stream task read error")
        }
    }
}

It's critical that data represents the data for the entire string, not just a substring. 至关重要的是, data代表整个字符串的数据,而不仅仅是子字符串。 If you are attempting to convert substrings from partial data of the entire string, it will fail in many cases. 如果您尝试从整个字符串的部分数据转换子字符串,在许多情况下它将失败。

It works with 1-byte characters because no matter where you chop the data stream, the partial data still represents a valid string. 它使用1个字节的字符,因为无论您在何处截断数据流,部分数据仍然代表有效的字符串。 But once you start dealing with multi-byte characters, a partial data stream could easily result in the first or last byte of the data being only part of a multi-byte character. 但是,一旦开始处理多字节字符,部分数据流就很容易导致数据的第一个或最后一个字节只是多字节字符的一部分。 This prevents the data from being interpreted properly. 这将导致无法正确解释数据。

So you must ensure that you build up a data object with all of the bytes of a given string before attempting to convert the data into a string. 因此,在尝试将数据转换为字符串之前,必须确保使用给定字符串的所有字节构建data对象。

Normally you should start your data with a byte count. 通常,您应该以字节计数开始数据。 Say the first 4 bytes represent a 32-bit integer in some agreed upon "endianness". 假设前四个字节代表某些商定的“字节顺序”中的32位整数。 You read those 4 bytes to get the length. 您读取这四个字节以获取长度。 Then you read data until you get that many more bytes. 然后读取数据,直到获得更多字节为止。 Then you know you are at the end of the message. 然后,您知道您在消息末尾。

The problem with trying to use an "end of message" marker at the end of your data is that the "end of message" marker could be split across reads. 尝试在数据末尾使用“消息末尾”标记的问题是,“消息末尾”标记可能会在读取中分开。 Either way, you need to refactor your code to process at the data level and not make any attempt to convert the data to a string until all of the string data is read. 无论哪种方式,您都需要重构代码以在数据级别进行处理,并且在读取所有字符串数据之前,不要尝试将数据转换为字符串。

As you know, single UTF-8 character is either in 1, 2, 3 or 4 bytes. 如您所知,单个UTF-8字符为1、2、3或4个字节。 For your case, you need to handle 1 or 2 byte characters. 对于您的情况,您需要处理1个或2个字节的字符。 And your receiving byte sequence may not be aligned to "character boundary". 并且您的接收字节序列可能未与“字符边界”对齐。 However, as rmaddy pointed, the byte sequence to String.Encoding.utf8 must start and end with right boundary. 但是,正如rmaddy所指出的,String.Encoding.utf8的字节序列必须以右边界开始和结束。

Now, there are two options to handle this situation. 现在,有两种方法可以处理这种情况。 One is, as rmaddy suggests, to send length at first and count incoming data bytes. 正如rmaddy所建议的那样,一种方法是首先发送长度并计算传入的数据字节。 The drawback of this is that you have to modify transmit (server) side as well, which may not be possible. 这样做的缺点是您还必须修改传输(服务器)端,这可能是不可能的。

Another option is to scan incoming sequence byte-by-byte and keep track the character boundary, then build up legitimate UTF-8 byte sequence. 另一种选择是逐字节扫描输入序列并跟踪字符边界,然后建立合法的UTF-8字节序列。 Fortunately, UTF-8 is designed so that you can easily identify where the character boundary is by seeing ANY byte in byte stream. 幸运的是, UTF-8的设计使您可以通过查看字节流中的任何字节来轻松识别字符边界在哪里。 Specifically, first byte of 1, 2, 3 and 4 byte UTF-8 character starts with 0xxxxxxx, 110xxxxx, 1110xxxx and 11110xxx respectively, and second..fourth bytes are all in 10xxxxxx in bit representation. 具体来说,1、2、3和4字节UTF-8字符的第一个字节分别以0xxxxxxx,110xxxxx,1110xxxx和11110xxx开头,第二..第四个字节以位表示形式都在10xxxxxx中。 This makes your life a lot easier. 这使您的生活更加轻松。

If you pick up your "end of message" marker from one of 1 byte UTF-8 characters, you can easily and successfully detect EOM w/o considering byte sequence since it's a single byte and doesn't appear anywhere in 2..4 byte chars. 如果您从1个字节的UTF-8字符之一中拾取“消息结尾”标记,则由于它是单个字节并且在2..4的任何位置都不会出现,因此可以轻松而成功地检测到不考虑字节顺序的EOM。字节字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM