简体   繁体   English

Json 返回特殊字符

[英]Json return with special characters

I'm getting as a return from an api a json with the characters "\\ u0083", "\\ u0087d" and "\\ u008d".我从 api 返回一个带有字符“\\ u0083”、“\\ u0087d”和“\\ u008d”的json。 I changed the encoding to utf-8 and ISO-8859-1 but I did not succeed, please someone could help in case, because the api that I am consuming will not be changed.我将编码更改为 utf-8 和 ISO-8859-1 但我没有成功,请有人帮忙以防万一,因为我正在使用的 api 不会更改。

Change in request header encoding, but unsuccessful更改请求头编码,但不成功

Examples:例子:

''' "prop": "SÃ\ƒO LUÃ\S", "prop": "RUA LUIZ GUIMARÃ\ƒES", "prop": "POÃ\‡O DA PANELA" ''' "prop": "SÃ\ƒO LUÃ\S", "prop": "RUA LUIZ GUIMARÃ\ƒES", "prop": "POÃ\‡O DA PANELA"

''' '''

You have UTF-8 bytes being decoded as ISO-8859-1.您将 UTF-8 字节解码为 ISO-8859-1。

'SÃO LUÍS' encoded as UTF-8 results in these bytes (the notation is Python, but the principles apply in any language): 'SÃO LUÍS' 编码为 UTF-8 导致这些字节(符号是 Python,但原则适用于任何语言):

b'S\xc3\x83O LU\xc3\x8dS' 

Decoding as ISO-8859-1 produces this string:解码为 ISO-8859-1 产生这个字符串:

'SÃ\x83O LUÃ\x8dS'

UTF-8 is a multi-byte encoding, but ISO-8859-1 is a single byte encoding. UTF-8 是多字节编码,而 ISO-8859-1 是单字节编码。 In this case the first bytes of UTF-8 encoded 'Ã' and 'Í' is \\xc3 , which is the ISO-8859-1 encoding for 'Ã'.在这种情况下,UTF-8 编码的 'Ã' 和 'Í' 的第一个字节是\\xc3 ,它是 'Ã' 的 ISO-8859-1 编码。 The second byte of each character is undefined in ISO-8859-1, so they are left unchanged by the decoding process.每个字符的第二个字节在 ISO-8859-1 中未定义,因此它们在解码过程中保持不变。

Assuming this corrupted data is generated by the API, you will need to iterate over the deserialised json data and encode each string as ISO-8859-1, then decode the resulting bytes as UTF-8.假设这个损坏的数据是由 API 生成的,您将需要遍历反序列化的 json 数据并将每个字符串编码为 ISO-8859-1,然后将结果字节解码为 UTF-8。

>>> bad = 'SÃ\u0083O LUÃ\u008dS'
>>> bad.encode('latin-1').decode('utf-8')
'SÃO LUÍS'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM