简体   繁体   中英

Json return with special characters

I'm getting as a return from an api a json with the characters "\\ u0083", "\\ u0087d" and "\\ u008d". I changed the encoding to utf-8 and ISO-8859-1 but I did not succeed, please someone could help in case, because the api that I am consuming will not be changed.

Change in request header encoding, but unsuccessful

Examples:

''' "prop": "SÃ\ƒO LUÃ\S", "prop": "RUA LUIZ GUIMARÃ\ƒES", "prop": "POÃ\‡O DA PANELA"

'''

You have UTF-8 bytes being decoded as ISO-8859-1.

'SÃO LUÍS' encoded as UTF-8 results in these bytes (the notation is Python, but the principles apply in any language):

b'S\xc3\x83O LU\xc3\x8dS' 

Decoding as ISO-8859-1 produces this string:

'SÃ\x83O LUÃ\x8dS'

UTF-8 is a multi-byte encoding, but ISO-8859-1 is a single byte encoding. In this case the first bytes of UTF-8 encoded 'Ã' and 'Í' is \\xc3 , which is the ISO-8859-1 encoding for 'Ã'. The second byte of each character is undefined in ISO-8859-1, so they are left unchanged by the decoding process.

Assuming this corrupted data is generated by the API, you will need to iterate over the deserialised json data and encode each string as ISO-8859-1, then decode the resulting bytes as UTF-8.

>>> bad = 'SÃ\u0083O LUÃ\u008dS'
>>> bad.encode('latin-1').decode('utf-8')
'SÃO LUÍS'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM