在Swift中解码引用的可打印消息

Question

我有一个引用可打印的字符串，例如“成本将= C2 = A31,000”。 如何将其转换为“费用为1,000英镑”。

我现在只是手动转换文本，但并未涵盖所有情况。 我确信只有一行代码可以帮助解决这个问题。

这是我的代码：

func decodeUTF8(message: String) -> String
{
    var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)

    return newMessage
}

谢谢

Answer 1

一种简单的方法是为此目的利用(NS)String方法stringByRemovingPercentEncoding 。 这是在解码quoted-printables中观察到的，因此第一个解决方案主要是将该线程中的答案转换为Swift。

我们的想法是用百分比编码“％NN”替换quoted-printable“= NN”编码，然后使用现有方法删除百分比编码。

延续线分开处理。 此外，必须首先编码输入字符串中的百分比字符，否则它们将被视为百分比编码中的前导字符。

func decodeQuotedPrintable(message : String) -> String? {
    return message
        .stringByReplacingOccurrencesOfString("=\r\n", withString: "")
        .stringByReplacingOccurrencesOfString("=\n", withString: "")
        .stringByReplacingOccurrencesOfString("%", withString: "%25")
        .stringByReplacingOccurrencesOfString("=", withString: "%")
        .stringByRemovingPercentEncoding
}

该函数返回一个可选字符串，对于无效输入，该字符串为nil 。 输入无效可以是：

一个“=”字符，后面没有两个十六进制数字，例如“= XX”。
“= NN”序列，其不解码为有效的UTF-8序列，例如“= E2 = 64”。

例子：

if let decoded = decodeQuotedPrintable("=C2=A31,000") {
    print(decoded) // £1,000
}

if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
    print(decoded) // “Hello … world!”
}

更新1：上面的代码假定消息使用UTF-8编码来引用非ASCII字符，就像在大多数示例中一样： C2 A3是“£”的UTF-8编码， E2 80 A4是UTF- 8编码…

如果输入是"Rub=E9n"则消息使用Windows-1252编码。 要正确解码，您必须替换

.stringByRemovingPercentEncoding

通过

.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)

还有一些方法可以从“Content-Type”标题字段中检测编码，比较例如https://stackoverflow.com/a/32051684/1187415 。

更新2： stringByReplacingPercentEscapesUsingEncoding方法被标记为已弃用，因此上述代码将始终生成编译器警告。 不幸的是，Apple似乎没有提供替代方法。

所以这是一个新的，完全独立的解码方法，它不会引起任何编译器警告。 这次我把它写成String的扩展方法。 解释注释在代码中。

extension String {

    /// Returns a new string made by removing in the `String` all "soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding. 
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences. 
        return self
            .stringByReplacingOccurrencesOfString("=\r\n", withString: "")
            .stringByReplacingOccurrencesOfString("=\n", withString: "")
            .decodeQuotedPrintableSequences(enc)
    }

    /// Helper function doing the real work.
    /// Decode all "=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {

        var result = ""
        var position = startIndex

        // Find the next "=" and copy characters preceding it to the result:
        while let range = rangeOfString("=", range: position ..< endIndex) {
            result.appendContentsOf(self[position ..< range.startIndex])
            position = range.startIndex

            // Decode one or more successive "=HH" sequences to a byte array:
            let bytes = NSMutableData()
            repeat {
                let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
                if hexCode.characters.count < 2 {
                    return nil // Incomplete hex code
                }
                guard var byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.appendBytes(&byte, length: 1)
                position = position.advancedBy(3)
            } while position != endIndex && self[position] == "="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.appendContentsOf(dec)
        }

        // Copy remaining characters to the result:
        result.appendContentsOf(self[position ..< endIndex])

        return result
    }
}

用法示例：

if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // £1,000
}

if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // “Hello … world!”
}

if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
    print(decoded) // Rubén
}

Swift 4（及更高版本）的更新：

extension String {

    /// Returns a new string made by removing in the `String` all "soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding.
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences.
        return self
            .replacingOccurrences(of: "=\r\n", with: "")
            .replacingOccurrences(of: "=\n", with: "")
            .decodeQuotedPrintableSequences(encoding: enc)
    }

    /// Helper function doing the real work.
    /// Decode all "=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {

        var result = ""
        var position = startIndex

        // Find the next "=" and copy characters preceding it to the result:
        while let range = range(of: "=", range: position..<endIndex) {
            result.append(contentsOf: self[position ..< range.lowerBound])
            position = range.lowerBound

            // Decode one or more successive "=HH" sequences to a byte array:
            var bytes = Data()
            repeat {
                let hexCode = self[position...].dropFirst().prefix(2)
                if hexCode.count < 2 {
                    return nil // Incomplete hex code
                }
                guard let byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.append(byte)
                position = index(position, offsetBy: 3)
            } while position != endIndex && self[position] == "="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.append(contentsOf: dec)
        }

        // Copy remaining characters to the result:
        result.append(contentsOf: self[position ..< endIndex])

        return result
    }
}

用法示例：

if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // £1,000
}

if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // “Hello … world!”
}

if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
    print(decoded) // Rubén
}

Answer 2

不幸的是，我的回答有点晚了。 但它可能对其他人有帮助。

var string = "The cost would be =C2=A31,000"

var finalString: String? = nil

if let regEx = try? NSRegularExpression(pattern: "={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
    let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate: "%$1")
    print(intermediatePercentEscapedString)
    finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
    print(finalString)
}

Answer 3

这种编码称为'quoted-printable'，你需要做的是使用ASCII编码将字符串转换为NSData，然后迭代数据，用字节/字符0xA3替换所有3个符号的方，如'= A3'，然后使用NSUTF8StringEncoding将结果数据转换为字符串。

Answer 4

为了提供适用的解决方案，还需要更多信息。 所以，我会做一些假设。

例如，在HTML或Mail消息中，您可以将一种或多种编码应用于某种源数据。 例如，您可以编码二进制文件，例如带有base64的png文件，然后压缩它。 订单很重要。

在你的例子中，源数据是一个字符串，并已通过UTF-8编码。

在HTPP消息中，您的Content-Type因此是text/plain; charset = UTF-8 text/plain; charset = UTF-8 。 在您的示例中，似乎还应用了其他编码，“Content-Transfer-Encoding”：可能Content-transfer-encoding是quoted-printable或base64 （尽管如此）。

为了将其还原，您需要以相反的顺序应用相应的解码。

提示：

查看邮件的原始源时，您可以查看邮件的标题（ Contente-type和Content-Transfer-Encoding ）。

Answer 5

您还可以查看此工作解决方案 - https://github.com/dunkelstern/QuotedPrintable

let result = QuotedPrintable.decode(string: quoted)

在Swift中解码引用的可打印消息

问题描述

5 个解决方案

解决方案1
4 已采纳 2015-09-28 16:22:40

解决方案2
1 2015-10-02 08:16:00

解决方案3
0 2015-09-26 11:40:25

解决方案4
0 2015-09-26 11:50:40

解决方案5
0 2016-07-05 21:33:01

在Swift中解码引用的可打印消息

问题描述

5 个解决方案

解决方案1 4 已采纳 2015-09-28 16:22:40

解决方案2 1 2015-10-02 08:16:00

解决方案3 0 2015-09-26 11:40:25

解决方案4 0 2015-09-26 11:50:40

解决方案5 0 2016-07-05 21:33:01

解决方案1
4 已采纳 2015-09-28 16:22:40

解决方案2
1 2015-10-02 08:16:00

解决方案3
0 2015-09-26 11:40:25

解决方案4
0 2015-09-26 11:50:40

解决方案5
0 2016-07-05 21:33:01