简体   繁体   English

Python 3 没有正确读取 JSON 文件

[英]Python 3 not reading a JSON file right

I have some json files created by powershell using the ConvertTo-Json command.我有一些由 powershell 使用ConvertTo-Json命令创建的 json 文件。 The content of the json file looks like json文件的内容看起来像

{
    "Key1":  "Value1",
    "Key2":  "Value2"
}

I ran the python interpreter to see if I could read the file but I get this weird output我运行了 python 解释器,看看我是否可以读取文件,但我得到了这个奇怪的输出

>>> f=open('test.json', 'r')
>>> f.read()
'ÿ\xfe{\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x001\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x001\x00"\x00,\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x002\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x002\x00"\x00\n\x00\n\x00}\x00\n\x00\n\x00'

For some reason all the characters are escaped byte characters and there's the weird ÿ at the begninning (powershell error?).出于某种原因,所有字符都是转义字节字符,并且在开始时有奇怪的ÿ (powershell 错误?)。

The weird thing is this:奇怪的是这个:

>>> f=open('test.json', 'r')
>>> str=f.read()
>>> type(str)
<class 'str'>
>>> json.loads(str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

So the input is a string, but the json module can't parse it ( json.load(f) return the same error).所以输入是一个字符串,但是 json 模块无法解析它( json.load(f)返回相同的错误)。 What is causing this error?是什么导致了这个错误? Is it a python thing, a powershell thing, a json thing?它是一个python的东西,一个powershell的东西,一个json的东西?

It seems that you have a BOM at the start of your file.您的文件开头似乎有一个BOM You can verify it in a hex editor or with a good text editor (Notepad++ shows if BOM is present).您可以在十六进制编辑器或良好的文本编辑器中对其进行验证(Notepad++ 会显示 BOM 是否存在)。

As pointed out by jwodder, PowerShell has encoded your json using UTF-16LE.正如 jwodder 所指出的,PowerShell 已经使用 UTF-16LE 对您的 json 进行了编码。 To get this data into json correctly, you need to open the file using the correct encoding.要将这些数据正确地转换为 json,您需要使用正确的编码打开文件。 eg.例如。

with open("test.json", "r", encoding="utf16") as f:
    json_string = f.read()
my_dict = json.loads(json_string)

You don't need to tell Python which variant of UTF-16 is being used.您不需要告诉 Python 正在使用哪种 UTF-16 变体。 This is the purpose of the first two bytes of the text file.这是文本文件前两个字节的用途。 It's called a Byte Order Mark (BOM).它被称为字节顺序标记 (BOM)。 It lets a program know if UTF-16LE or UTF-16BE has been used to encode the text file.它让程序知道是否已使用 UTF-16LE 或 UTF-16BE 对文本文件进行编码。

If you want to load text files with Unicode BOM headers, like yours you should better use to codecs.open functions instead of open as the default open is not able to interpret the BOM.如果你想加载带有 Unicode BOM 标头的文本文件,就像你的一样,你应该更好地使用 codecs.open 函数而不是 open 因为默认的 open 无法解释 BOM。

Or you can have a look at tendo.unicode - a small library that I wrote that can improve life for people that are not used to Unicode texts.或者你可以看看tendo.unicode——我写的一个小库,可以改善不习惯Unicode文本的人的生活。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM