简体   繁体   中英

How to fix incorrectly UTF-8 decoded string?

I'm consuming data from a RESTful API, it returns me strings and integer values. However, it seems it's returning some string values incorrectly encoded/decoded (probably).

Expected string:

criança

String received:

criança

Here is my code:

url = "https://analytics.us.algolia.com/2/searches?index={index}&startDate={yesterday}".format(index=index, yesterday=yesterday)
headers = { 'X-Algolia-Application-Id': app_id,
            'X-Algolia-API-Key': app_key,
            'Content-Type': 'application/json; charset=utf-8'}

response = requests.get(url, headers=headers)
response_json = json.loads(response.text)

print(response_json)

This is for a Python 3.6.x script that will get data from Algolia's RESTful API and store it in Amazon Redshift. I'm writing this script on Ubuntu 18.04, my Terminal character encoding set is pt_BR.UTF-8 ( echo $LANG ) and UTF-8 ( locale charmap ).

I see the received data is wrong when I print it before storing it in database - which is set to use charset=utf8 . I can also see this wrong data in database, through SELECT statement.

I found this UTF-8 Encoding Debugging Chart , it points out that probably it happened because of UTF-8 bytes being interpreted as Windows-1252 (or ISO 8859-1) bytes.

How can I treat it using some Python function/lib?

The requests library tries to guess the encoding of the response . It's possible requests is decoding the response as cp1252 (aka Windows-1252).

I'mg guessing this because if you take that text and encode it back to cp1252 and then decode it as utf-8 , you'll see the correct text:

>>> 'criança'.encode('cp1252').decode('utf-8')
'criança'

Based on that, I'd guess that if you ask your response object what encoding it guessed, it'll tell you cp1252 :

>>> response.encoding
'cp1252'

Forcing requests to decode as utf-8 instead, like this, will probably fix your issue:

>>> response.encoding = 'utf-8'

if the problem persists, copy your project to a different folder, import your project anew with a different project file name . Restart your Android Studio first , then import the project from a different folder and you should have cleared the problem !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM