简体   繁体   中英

Getting UnicodeEncodeError when trying to read a JSON string from a request text

I am learning python3 and I am currently working in a little project which involves web scraping and the JSON module. My script gets a string of json data and when I try to load it with the json module I get the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\Č' in position 1: character maps to .

I am able to print the string but not to load it with json.loads

The code is:

jsonData = json.loads(r.text)
pprint.pprint(jsonData)

while the r.text where it fails is:

{'event': {'sport': {'name': 'Tennis', 'homePlayer': 'Nadal', 'awayPlayer': '\Čili\ć'} ...

How can I avoid this error? I have been trying to encode with utf-8 but I get the same result. If the value of the dictionary of key "awayPlayer" is "\Čili\ć" or similar instead of "Cilic" (which is the true name of the awayPlayer) it would be fine.

Thank you!

The question does not contain sufficient a minimal, complete and verifiable example . Therefore, it's hard to guess content and encoding of input file as well as what is r (and maybe more unclear issues).

Let's suppose that there is "awayPlayer": "\Čili\\\ć" in r.text (note number of backslashes in vs. \\\ć ):

import json
import pprint

fileData = '{"event": {"sport": {"name": "Tennis", "homePlayer": "Nadal", "awayPlayer": "\u010cili\\u0107"}}}'
jsonData = json.loads(fileData)

print('\n   print(fileData), pprint.pprint(fileData):')
print(fileData)
# print(repr(fileData)) # debugging
pprint.pprint(fileData)

print('\n   print(jsonData), pprint.pprint(jsonData):')
pprint.pprint(jsonData)
# print(jsonData['event']['sport']['awayPlayer']) # debugging

Output does not reproduce any issue: 58259849.py

   print(fileData), pprint.pprint(fileData):
{"event": {"sport": {"name": "Tennis", "homePlayer": "Nadal", "awayPlayer": "Čili\u0107"}}}
('{"event": {"sport": {"name": "Tennis", "homePlayer": "Nadal", "awayPlayer": '
 '"Čili\\u0107"}}}')

   print(jsonData), pprint.pprint(jsonData):
{'event': {'sport': {'name': 'Tennis', 'homePlayer': 'Nadal', 'awayPlayer': 'Čilić'}}}
{'event': {'sport': {'awayPlayer': 'Čilić',
                     'homePlayer': 'Nadal',
                     'name': 'Tennis'}}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM