[英]Swift 3: how to convert a UTF8 data stream (1,2,3 or 4 bytes per char) to String?
在我的應用中,tcp客戶端正在處理來自遠程tcp服務器的數據流。 接收的字符為1字節字符時,一切正常。 當tcp服務器發送諸如“ü”(十六進制“ c3b5”-> 2字節字符)之類的特殊字符時,我開始遇到問題。
這是Swift 3的代碼行,每當接收到的數據包含一些大於1個字節的UTF8字符時,該代碼都會獲得nil String:
let convertedString = String(bytes: data, encoding: String.Encoding.utf8)
關於如何解決此問題的任何想法? 基本上,傳入流可以包含編碼為UTF8的1字節或2字節字符,我需要將數據流轉換為String而不出現問題。
這是我遇到問題的整個代碼部分:
func startRead(for task: URLSessionStreamTask) {
task.readData(ofMinLength: 1, maxLength: 65535, timeout: 300) { (data, eof, error) in
if let data = data {
NSLog("stream task read %@", data as NSData)
let convertedString1 = String(data: data, encoding: String.Encoding(rawValue: String.Encoding.utf8.rawValue))
if let convertedString = String(bytes: data, encoding: String.Encoding.utf8) {
self.partialMessage = self.partialMessage + convertedString
NSLog(convertedString)
// Assign lengths (delimiter, MD5 digest, minimum expected length, message length)
let delimiterLength = Constants.END_OF_MESSAGE_DELIMITER.lengthOfBytes(using: String.Encoding.utf8)
let MD5Length = 32 // 32 characters -> hex representation of 16 bytes
// 3 = CR+LF+1 char at least
let minimumExpectedMessageLength = MD5Length + delimiterLength + 3
let messageLength = self.partialMessage.lengthOfBytes(using: String.Encoding.utf8)
// Check for delimiter and minimum expected message length (2 char msg + MD5 digest + delimiter)
if (self.partialMessage.contains(Constants.END_OF_MESSAGE_DELIMITER)) &&
(messageLength >= minimumExpectedMessageLength) {
var message = self.partialMessage
// Get rid of optional CR+LF
var lowBound = message.index(message.endIndex, offsetBy: -1)
var hiBound = message.index(message.endIndex, offsetBy: 0)
var midRange = lowBound ..< hiBound
let optionalCRLF = message.substring(with: midRange)
if (optionalCRLF == "\r\n") || (optionalCRLF == "\0") { // Remove CR+LF if present
lowBound = message.index(message.endIndex, offsetBy: -1)
hiBound = message.index(message.endIndex, offsetBy: 0)
midRange = lowBound ..< hiBound
message.removeSubrange(midRange)
}
// Check for delimiter proper position (has to be at the end)
lowBound = message.index(message.endIndex, offsetBy: -delimiterLength)
hiBound = message.index(message.endIndex, offsetBy: 0)
midRange = lowBound ..< hiBound
let delimiter = message.substring(with: midRange)
if (delimiter == Constants.END_OF_MESSAGE_DELIMITER) // Delimiter in proper position?
{
// Acquire the MD digest
lowBound = message.index(message.endIndex, offsetBy: -(MD5Length+delimiterLength))
hiBound = message.index(message.endIndex, offsetBy: -(delimiterLength))
midRange = lowBound ..< hiBound
let receivedMD5 = message.substring(with: midRange)
// Acquire the deframed message (normalized message)
lowBound = message.index(message.startIndex, offsetBy: 0)
hiBound = message.index(message.endIndex, offsetBy: -(MD5Length+delimiterLength))
midRange = lowBound ..< hiBound
let normalizedMessage = message.substring(with: midRange)
// Calculate the MD5 digest on the normalized message
let calculatedMD5Digest = normalizedMessage.md5()
// Debug
print(delimiter)
print(normalizedMessage)
print(receivedMD5)
print(calculatedMD5Digest!)
// Check for the integrity of the data
if (receivedMD5.lowercased() == calculatedMD5Digest?.lowercased()) || self.noMD5Check // TEMPORARY
{
if (normalizedMessage == "Unauthorized Access")
{
// Update the authorization status
self.authorized = false
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("Unauthorized Access", comment: "Unauthorized Access Title"), message: NSLocalizedString("Please login with the proper Username and Password before to send any command!", comment: "Unauthorized Access Message"))
}
else if (normalizedMessage == "System Busy")
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("System Busy", comment: "System Busy Title"), message: NSLocalizedString("The system is busy at the moment. Only one connection at a time is allowed!", comment: "System Busy Message"))
}
else if (normalizedMessage == "Error")
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("An error occurred during the execution of the command!", comment: "Command Error Message"))
}
else if (normalizedMessage == "ErrorMachineRunning")
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("The command cannot be executed while the machine is running", comment: "Machine Running Message 1")+"!\r\n\n "+NSLocalizedString("Trying to execute any command in this state could be dangerous for both people and machinery", comment: "Machine Running Message 2")+".\r\n\n "+NSLocalizedString("Please stop the machine and leave the automatic or semi-automatic modes before to provide any command", comment: "Machine Running Message 3")+".")
}
else if (normalizedMessage == "Command Not Recognized")
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("Error", comment: "Error Title"), message: NSLocalizedString("Command not recognized!", comment: "Command Unrecognized Message"))
}
else
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
//let testMessage = "test\r\nf3ea0b9bff4a2c79e60acf6873f4a1ce</EOM>\r\n"
//normalizedMessage = testMessage
// Process the received csv file
self.processCsvData(file: normalizedMessage)
}
}
else
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("Data Error", comment: "Data Error Title"), message: NSLocalizedString("The received data cannot be read since it's corrupted or incomplete!", comment: "Data Error Message"))
}
}
else
{
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Stop the stream
NSLog("stream task stop")
self.stop(task: task)
// Shows an alert
self.showAlert(title: NSLocalizedString("Data Error", comment: "Data Error Title"), message: NSLocalizedString("The received data cannot be read since it's corrupted or incomplete!", comment: "Data Error Message"))
}
}
}
}
if eof {
// Stop the refresh control
if let refreshControl = self.refreshControl {
if refreshControl.isRefreshing {
refreshControl.endRefreshing()
}
}
// Refresh the tableview content
self.tableView.reloadData()
// Stop the stream
NSLog("stream task end")
self.stop(task: task)
} else if error == nil {
self.startRead(for: task)
} else {
// We ignore the error because we'll see it again in `didCompleteWithError`.
NSLog("stream task read error")
}
}
}
至關重要的是, data
代表整個字符串的數據,而不僅僅是子字符串。 如果您嘗試從整個字符串的部分數據轉換子字符串,在許多情況下它將失敗。
它使用1個字節的字符,因為無論您在何處截斷數據流,部分數據仍然代表有效的字符串。 但是,一旦開始處理多字節字符,部分數據流就很容易導致數據的第一個或最后一個字節只是多字節字符的一部分。 這將導致無法正確解釋數據。
因此,在嘗試將數據轉換為字符串之前,必須確保使用給定字符串的所有字節構建data
對象。
通常,您應該以字節計數開始數據。 假設前四個字節代表某些商定的“字節順序”中的32位整數。 您讀取這四個字節以獲取長度。 然后讀取數據,直到獲得更多字節為止。 然后,您知道您在消息末尾。
嘗試在數據末尾使用“消息末尾”標記的問題是,“消息末尾”標記可能會在讀取中分開。 無論哪種方式,您都需要重構代碼以在數據級別進行處理,並且在讀取所有字符串數據之前,不要嘗試將數據轉換為字符串。
如您所知,單個UTF-8字符為1、2、3或4個字節。 對於您的情況,您需要處理1個或2個字節的字符。 並且您的接收字節序列可能未與“字符邊界”對齊。 但是,正如rmaddy所指出的,String.Encoding.utf8的字節序列必須以右邊界開始和結束。
現在,有兩種方法可以處理這種情況。 正如rmaddy所建議的那樣,一種方法是首先發送長度並計算傳入的數據字節。 這樣做的缺點是您還必須修改傳輸(服務器)端,這可能是不可能的。
另一種選擇是逐字節掃描輸入序列並跟蹤字符邊界,然后建立合法的UTF-8字節序列。 幸運的是, UTF-8的設計使您可以通過查看字節流中的任何字節來輕松識別字符邊界在哪里。 具體來說,1、2、3和4字節UTF-8字符的第一個字節分別以0xxxxxxx,110xxxxx,1110xxxx和11110xxx開頭,第二..第四個字節以位表示形式都在10xxxxxx中。 這使您的生活更加輕松。
如果您從1個字節的UTF-8字符之一中拾取“消息結尾”標記,則由於它是單個字節並且在2..4的任何位置都不會出現,因此可以輕松而成功地檢測到不考慮字節順序的EOM。字節字符。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.