![](/img/trans.png)
[英]Python3 - How do I read a string of byte values and re-encode it as bytes?
[英]Using Swift, how do you re-encode then decode a String like this short script in Python?
XKCD 的 API 和奇怪的編碼問題有一些問題。
解決方案(在 Python 中)是將其編碼為 latin1 然后解碼為 utf8,但是我如何在 Swift 中做到這一點?
測試字符串:
"Be careful\u00e2\u0080\u0094it's breeding season"
預期輸出:
Be careful—it's breeding season
Python(來自上面的鏈接):
import json
a = '''"Be careful\u00e2\u0080\u0094it's breeding season"'''
print(json.loads(a).encode('latin1').decode('utf8'))
這是如何在 Swift 中完成的?
let strdata = "Be careful\\u00e2\\u0080\\u0094it's breeding season".data(using: .isoLatin1)!
let str = String(data: strdata, encoding: .utf8)
那不行!
您必須先解碼 JSON 數據,然后提取字符串,最后“修復”字符串。 這是來自https://xkcd.com/1814/info.0.json的 JSON 的自包含示例:
let data = """
{"month": "3", "num": 1814, "link": "", "year": "2017", "news": "",
"safe_title": "Color Pattern", "transcript": "",
"alt": "\\u00e2\\u0099\\u00ab When the spacing is tight / And the difference is slight / That's a moir\\u00c3\\u00a9 \\u00e2\\u0099\\u00ab",
"img": "https://imgs.xkcd.com/comics/color_pattern.png",
"title": "Color Pattern", "day": "22"}
""".data(using: .utf8)!
// Alternatively:
// let url = URL(string: "https://xkcd.com/1814/info.0.json")!
// let data = try! Data(contentsOf: url)
do {
if let dict = (try JSONSerialization.jsonObject(with: data, options: [])) as? [String: Any],
var alt = dict["alt"] as? String {
// Now try fix the "alt" string
if let isoData = alt.data(using: .isoLatin1),
let altFixed = String(data: isoData, encoding: .utf8) {
alt = altFixed
}
print(alt)
// ♫ When the spacing is tight / And the difference is slight / That's a moiré ♫
}
} catch {
print(error)
}
如果你只有一個表格字符串
小心\â\\現在是繁殖季節
那么你仍然可以使用JSONSerialization
來解碼\\uNNNN
轉義序列,然后繼續如上。
一個簡單的例子(為簡潔起見省略了錯誤檢查):
let strbad = "Be careful\\u00e2\\u0080\\u0094it's breeding season"
let decoded = try! JSONSerialization.jsonObject(with: Data("\"\(strbad)\"".utf8), options: .allowFragments) as! String
let strgood = String(data: decoded.data(using: .isoLatin1)!, encoding: .utf8)!
print(strgood)
// Be careful—it's breeding season
我找不到任何內置的東西,但我確實設法為你寫了這個。
extension String {
func range(nsRange: NSRange) -> Range<Index> {
return Range(nsRange, in: self)!
}
func nsRange(range: Range<Index>) -> NSRange {
return NSRange(range, in: self)
}
var fullRange: Range<Index> {
return startIndex..<endIndex
}
var fullNSRange: NSRange {
return nsRange(range: fullRange)
}
subscript(nsRange: NSRange) -> Substring {
return self[range(nsRange: nsRange)]
}
func convertingUnicodeCharacters() -> String {
var string = self
// Characters need to be replaced in groups in case of clusters
let groupedRegex = try! NSRegularExpression(pattern: "(\\\\u[0-9a-fA-F]{1,8})+")
for match in groupedRegex.matches(in: string, range: string.fullNSRange).reversed() {
let groupedHexValues = String(string[match.range])
var characters = [Character]()
let regex = try! NSRegularExpression(pattern: "\\\\u([0-9a-fA-F]{1,8})")
for hexMatch in regex.matches(in: groupedHexValues, range: groupedHexValues.fullNSRange) {
let hexString = groupedHexValues[Range(hexMatch.range(at: 1), in: string)!]
if let hexValue = UInt32(hexString, radix: 16),
let scalar = UnicodeScalar(hexValue) {
characters.append(Character(scalar))
}
}
string.replaceSubrange(Range(match.range, in: string)!, with: characters)
}
return string
}
}
它基本上查找任何\\u\u0026lt;1-8 digit hex>
值並將它們轉換為標量。 應該相當簡單......
我的游樂場測試代碼很簡單:
let string = "Be careful\\u00e2\\u0080\\u0094-\\u1F496\\u65\\u301it's breeding season"
let expected = "Be careful\u{00e2}\u{0080}\u{0094}-\u{1f496}\u{65}\u{301}it's breeding season"
string.convertingUnicodeCharacters() == expected // true 🎉
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.