简体   繁体   中英

Python JSON decoder error with unicode characters in request content

Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)

Execute same http request with Postman the json output is:

{ "value": "VILLE D\u0019ANAUNIA" }

My python code is:

data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)

Can I remove or replace all Unicode chars before executing conversion with json.loads(...)?

It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 ( ' ). For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string.

So the correct way would be to control what exactly the API returns. If id does return a '\' control character, you should contact the API owner because the problem should be there.

As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127))  # filter out unwanted chars
json_data = json.loads(data)

You should get {'value': 'VILLE DANAUNIA'}

Alternatively, you can replace all unwanted characters with spaces:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)

You would get {'value': 'VILLE D ANAUNIA'}

The code below works on python 2.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)

The code below works on python 3.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)

Output:

{u'value': u'VILLE D\x19ANAUNIA'}

Another point is that requests get return the data as json:

r = requests.get('https://api.github.com/events')
r.json()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM