简体   繁体   中英

json.load gives error when loading a file with latin character

I am working on a python project and I am really confused about this whole utf-8/latin-1 encoding/decoding subject.

My linux system is an Openshift free account.

I am trying to load a file which contains a json data object. The object has an entry that contains a latin character.

test.json:

 {
 "name" : "Corazón"
 }

When I load it on my Windows system I don't get an error but the result after the json.load is:

Windows Output:

Corazón

Openshift Linux system trackback :

data = json.load(data_file, encoding='utf-8')
 File "/opt/rh/python33/root/usr/lib64/python3.3/json/__init__.py", line 271, in load
return loads(fp.read(),
 File "/opt/rh/python33/root/usr/lib64/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128)

My code is:

import json

with open("test.json") as data_file:
    data = json.load(data_file, encoding='utf-8')

print(data['name'])

I have tried different encodings ('utf-8', 'ascii', 'latin-1') and they all give me the same results. I obviously am missing something here. Plus as you can see I get different results from windows and linux python.

How should I configure the json.load so it can load the file correctly on both Windows and linux python systems?

Update 1

I have ran some more tests. The 'test.json' file is utf-8 encoded and still gives me the above results. When I encode the file as ISO 8859-1 the Windows output is correct but the linux out still results in the error.

I have even cut and pasted the test.json file from this SO question to run my tests to be on the same page as everybody else.

Update 2

If I convert my test.json file to 'Windows-1252' format the windows output is correct. The linux box still results in the same error. I'm not sure why the windows box does not work the the file is converted to utf-8.

In Python 3, the character encoding / decoding is handled by the file object itself. Specify the encoding in the open() call:

import json
with open("test.json", encoding='utf-8') as data_file:                           
    data = json.load(data_file)

print(data['name'])

It will correctly load the file on both platforms, if the file is correctly encoded as UTF-8.

It will definitely never raising the UnicodeDecodeError error you showed, because it's not using the ascii codec.

If you output to the console code page must contain all the characters you're printing, otherwise print() will raise an UnicodeEncodeError error.

I recommend you do not fall back to the ISO 8859-1 codec. Those codecs should be considered legacy codecs on the internet. Sticking to UTF-8 will save you a lot of other headaches with handling names (and other text) in different languages.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM