[英]Json return with special characters
I'm getting as a return from an api a json with the characters "\\ u0083", "\\ u0087d" and "\\ u008d".我从 api 返回一个带有字符“\\ u0083”、“\\ u0087d”和“\\ u008d”的json。 I changed the encoding to utf-8 and ISO-8859-1 but I did not succeed, please someone could help in case, because the api that I am consuming will not be changed.
我将编码更改为 utf-8 和 ISO-8859-1 但我没有成功,请有人帮忙以防万一,因为我正在使用的 api 不会更改。
Change in request header encoding, but unsuccessful更改请求头编码,但不成功
Examples:例子:
''' "prop": "SÃ\O LUÃ\S", "prop": "RUA LUIZ GUIMARÃ\ES", "prop": "POÃ\O DA PANELA" ''' "prop": "SÃ\O LUÃ\S", "prop": "RUA LUIZ GUIMARÃ\ES", "prop": "POÃ\O DA PANELA"
''' '''
You have UTF-8 bytes being decoded as ISO-8859-1.您将 UTF-8 字节解码为 ISO-8859-1。
'SÃO LUÍS' encoded as UTF-8 results in these bytes (the notation is Python, but the principles apply in any language): 'SÃO LUÍS' 编码为 UTF-8 导致这些字节(符号是 Python,但原则适用于任何语言):
b'S\xc3\x83O LU\xc3\x8dS'
Decoding as ISO-8859-1 produces this string:解码为 ISO-8859-1 产生这个字符串:
'SÃ\x83O LUÃ\x8dS'
UTF-8 is a multi-byte encoding, but ISO-8859-1 is a single byte encoding. UTF-8 是多字节编码,而 ISO-8859-1 是单字节编码。 In this case the first bytes of UTF-8 encoded 'Ã' and 'Í' is
\\xc3
, which is the ISO-8859-1 encoding for 'Ã'.在这种情况下,UTF-8 编码的 'Ã' 和 'Í' 的第一个字节是
\\xc3
,它是 'Ã' 的 ISO-8859-1 编码。 The second byte of each character is undefined in ISO-8859-1, so they are left unchanged by the decoding process.每个字符的第二个字节在 ISO-8859-1 中未定义,因此它们在解码过程中保持不变。
Assuming this corrupted data is generated by the API, you will need to iterate over the deserialised json data and encode each string as ISO-8859-1, then decode the resulting bytes as UTF-8.假设这个损坏的数据是由 API 生成的,您将需要遍历反序列化的 json 数据并将每个字符串编码为 ISO-8859-1,然后将结果字节解码为 UTF-8。
>>> bad = 'SÃ\u0083O LUÃ\u008dS'
>>> bad.encode('latin-1').decode('utf-8')
'SÃO LUÍS'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.