将字符串转换为字典

Question

我正在读取多个json_file并将其存储在json_text如下所示：

json_text = json_file.read()

当我print json_text我得到以下信息：

{
  "speech": {
    "text": "<p>Lords</p><p>We are all in the same boat</p><p>It is time for us to help</p>",
    "id": null,
    "doc_id": null,
    "fave": "N",
    "system": "2015-09-24 13:00:17"
 }
}
<type 'str'>

我以为我可以通过使用json.loads()将其作为字典，但这不起作用：

ValueError：无法解码JSON对象

显然loads()不能将json_text标识为JSON，即使根据http://jsonlint.com它是有效的JSON，所以我认为我先使用dump()然后再使用loads() ：

json_dumps = json.dumps(json_text)
json_loads = json.loads(json_dumps)
print json_loads, type(json_loads)

给出：

{
  "speech": {
    "text": "<p>Lords</p><p>We are all in the same boat</p><p>It is time for us to help</p>",
    "id": null,
    "doc_id": null,
    "fave": "N",
    "system": "2015-09-24 13:00:17"
 }
}
<type 'unicode'>

我也尝试过在json_text上使用ast和json_text literal_eval() ，但是我得到了：

ValueError：格式错误的字符串

所以。 场景是我在一个文件夹中有多个json文件。 我想加载这些文件并获取特定的密钥，并将其存储在pandas DataFrame 。 我已经尝试过pd.read_json()但是它只是告诉我json出问题了。

这是我的代码：

path_to_json = 'folder/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]

for index, js in enumerate(json_files):
    with open(os.path.join(path_to_json, js)) as json_file:
         json.load(json_file)

给出ValueError: No JSON object could be decoded ，因此我尝试使用json_file.read()等。

Answer 1

正如我在评论中提到的，如果编码不是基于ASCII的，也会导致ValueError 。 例如，以下json.loads失败：

>>> json.loads(u'{"id": null}'.encode("utf16"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

您可以查看编码的一种方式是print(repr(json_text)) ，它可以显示其他字节（例如UTF-16）：

>>> print(repr(u'{"id": null}'.encode("utf16")))
'\xff\xfe{\x00"\x00i\x00d\x00"\x00:\x00 \x00n\x00u\x00l\x00l\x00}\x00'

json.load和json.loads在Python 2中都支持编码参数。 但这仅适用于基于ASCII的编码，因此对于UTF-16，您将获得相同的ValueError ：

>>> json.loads(u'{"id": null}'.encode("utf16"), encoding="utf16")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/json/__init__.py", line 352, in loads
    return cls(encoding=encoding, **kw).decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

如果仍然如此（并且您可以确定问题编码不正确），则可以手动解码字符串：

json_text = json_text.decode("utf16")

或者，您可以使用codecs.open加载文件：

with codecs.open(json_file_name, "r", encoding="utf16") as f:
    print(json.load(f))
    # or
    # json_text = f.read()

（请注意，我在这里使用的是UTF-16，但这可能不适用于您）

从您的JSON文本中看，字符本身都是ASCII字符，因此任何基于ASCII的编码（例如latin-1）在不进行任何解码的情况下仍然可以工作，因为以ASCII，UTF8或latin-1编码的JSON内容之间没有区别。

作为附带说明，您转储了文本并加载了该文本，并获得了unicode对象。 从理论上讲（如果我的回答正确），您应该能够实际加载json_loads （aka json.loads(json_loads) ）。

Answer 2

不确定您的错误是什么。 我可以使用以下代码按预期运行：

import json

strdata = """
{
    "speech": {
        "text": "<p>Lords</p><p>We are all in the same boat</p><p>It is time for us to help</p>",
        "id": null,
        "doc_id": null,
        "fave": "N",
        "system": "2015-09-24 13:00:17"
    }
}
"""

data = json.loads(strdata)
print(data)

Answer 3

这似乎是编码的问题，也许已经从文件中读取了。 您应该在json.loads中使用适当的编码作为参数。

将字符串转换为字典

问题描述

3 个解决方案

解决方案1
1 2016-10-28 11:35:57

解决方案2
0 2016-10-28 10:43:09

解决方案3
0 2016-10-28 10:54:23

将字符串转换为字典

问题描述

3 个解决方案

解决方案1 1 2016-10-28 11:35:57

解决方案2 0 2016-10-28 10:43:09

解决方案3 0 2016-10-28 10:54:23

解决方案1
1 2016-10-28 11:35:57

解决方案2
0 2016-10-28 10:43:09

解决方案3
0 2016-10-28 10:54:23