简体   繁体   English

加载带有拉丁字符的文件时,json.load提供错误

[英]json.load gives error when loading a file with latin character

I am working on a python project and I am really confused about this whole utf-8/latin-1 encoding/decoding subject. 我正在研究一个python项目,我对整个utf-8 / latin-1编码/解码主题感到非常困惑。

My linux system is an Openshift free account. 我的Linux系统是一个Openshift免费帐户。

I am trying to load a file which contains a json data object. 我正在尝试加载包含json数据对象的文件。 The object has an entry that contains a latin character. 该对象具有一个包含拉丁字符的条目。

test.json: test.json:

 {
 "name" : "Corazón"
 }

When I load it on my Windows system I don't get an error but the result after the json.load is: 当我在Windows系统上加载它时,没有收到错误,但是json.load之后的结果是:

Windows Output: Windows输出:

Corazón Corazón

Openshift Linux system trackback : Openshift Linux系统引用

data = json.load(data_file, encoding='utf-8')
 File "/opt/rh/python33/root/usr/lib64/python3.3/json/__init__.py", line 271, in load
return loads(fp.read(),
 File "/opt/rh/python33/root/usr/lib64/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128)

My code is: 我的代码是:

import json

with open("test.json") as data_file:
    data = json.load(data_file, encoding='utf-8')

print(data['name'])

I have tried different encodings ('utf-8', 'ascii', 'latin-1') and they all give me the same results. 我尝试了不同的编码('utf-8','ascii','latin-1'),它们都给了我相同的结果。 I obviously am missing something here. 我显然在这里错过了一些东西。 Plus as you can see I get different results from windows and linux python. 另外,如您所见,我从Windows和linux python获得了不同的结果。

How should I configure the json.load so it can load the file correctly on both Windows and linux python systems? 我应该如何配置json.load,以便它可以在Windows和linux python系统上正确加载文件?

Update 1 更新1

I have ran some more tests. 我进行了更多测试。 The 'test.json' file is utf-8 encoded and still gives me the above results. 'test.json'文件是utf-8编码的,仍然可以得到上述结果。 When I encode the file as ISO 8859-1 the Windows output is correct but the linux out still results in the error. 当我将文件编码为ISO 8859-1时,Windows输出是正确的,但Linux输出仍然会导致错误。

I have even cut and pasted the test.json file from this SO question to run my tests to be on the same page as everybody else. 我什至从这个SO问题中剪切并粘贴了test.json文件,以使我的测试与其他所有人位于同一页面上。

Update 2 更新2

If I convert my test.json file to 'Windows-1252' format the windows output is correct. 如果我将test.json文件转换为“ Windows-1252”格式,则Windows输出正确。 The linux box still results in the same error. linux框仍然会导致相同的错误。 I'm not sure why the windows box does not work the the file is converted to utf-8. 我不确定Windows框为什么不起作用,文件已转换为utf-8。

In Python 3, the character encoding / decoding is handled by the file object itself. 在Python 3中,字符编码/解码由文件对象本身处理。 Specify the encoding in the open() call: open()调用中指定编码:

import json
with open("test.json", encoding='utf-8') as data_file:                           
    data = json.load(data_file)

print(data['name'])

It will correctly load the file on both platforms, if the file is correctly encoded as UTF-8. 如果文件被正确编码为UTF-8,它将在两个平台上正确加载文件。

It will definitely never raising the UnicodeDecodeError error you showed, because it's not using the ascii codec. 它绝对不会引发您显示的UnicodeDecodeError错误,因为它没有使用ascii编解码器。

If you output to the console code page must contain all the characters you're printing, otherwise print() will raise an UnicodeEncodeError error. 如果输出到控制台代码页,则必须包含要打印的所有字符,否则print()会引发UnicodeEncodeError错误。

I recommend you do not fall back to the ISO 8859-1 codec. 我建议您不要退回到ISO 8859-1编解码器。 Those codecs should be considered legacy codecs on the internet. 这些编解码器应被视为Internet上的旧编解码器。 Sticking to UTF-8 will save you a lot of other headaches with handling names (and other text) in different languages. 坚持使用UTF-8可以避免使用其他语言处理名称(和其他文本)时遇到的其他麻烦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM