简体   繁体   English

Python JSON解码器错误,请求内容中包含unicode字符

[英]Python JSON decoder error with unicode characters in request content

Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char: 使用请求库执行返回JSON响应的http GET当响应字符串包含unicode char时,我收到此错误:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)

Execute same http request with Postman the json output is: 与Postman执行相同的http请求json输出是:

{ "value": "VILLE D\u0019ANAUNIA" }

My python code is: 我的python代码是:

data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)

Can I remove or replace all Unicode chars before executing conversion with json.loads(...)? 在使用json.loads(...)执行转换之前,我可以删除或替换所有Unicode字符吗?

It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 ( ' ). 这可能是由正确的单一报价标记U + 2019( ' )引起的。 For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string. 由于我无法猜测的原因,高位字节已经被删除,留下一个控制字符,应该在正确的JSON字符串中进行转义。

So the correct way would be to control what exactly the API returns. 因此,正确的方法是控制API返回的确切内容。 If id does return a '\' control character, you should contact the API owner because the problem should be there. 如果id确实返回'\'控制字符,您应该联系API所有者,因为问题应该存在。

As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters: 作为解决方法,您可以尝试通过过滤掉非ascii或控制字符来限制处理问题:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127))  # filter out unwanted chars
json_data = json.loads(data)

You should get {'value': 'VILLE DANAUNIA'} 你应该得到{'value': 'VILLE DANAUNIA'}

Alternatively, you can replace all unwanted characters with spaces: 或者,您可以用空格替换所有不需要的字符:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)

You would get {'value': 'VILLE D ANAUNIA'} 你会得到{'value': 'VILLE D ANAUNIA'}

The code below works on python 2.7: 下面的代码适用于python 2.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)

The code below works on python 3.7: 下面的代码适用于python 3.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)

Output: 输出:

{u'value': u'VILLE D\x19ANAUNIA'}

Another point is that requests get return the data as json: 另一点是请求以json的形式返回数据:

r = requests.get('https://api.github.com/events')
r.json()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM