简体   繁体   中英

Convert string to dict

I am reading multiple json_file and storing them in json_text like this:

json_text = json_file.read()

When I print json_text I get the following information:

{
  "speech": {
    "text": "<p>Lords</p><p>We are all in the same boat</p><p>It is time for us to help</p>",
    "id": null,
    "doc_id": null,
    "fave": "N",
    "system": "2015-09-24 13:00:17"
 }
}
<type 'str'>

I was assuming I would get this as a dict by using json.loads() but that doesn't work:

ValueError: No JSON object could be decoded

Apparently loads() doesn't identify json_text as JSON, even though it is a valid JSON according to http://jsonlint.com So I thought I'd use dump() and then loads() :

json_dumps = json.dumps(json_text)
json_loads = json.loads(json_dumps)
print json_loads, type(json_loads)

Gives:

{
  "speech": {
    "text": "<p>Lords</p><p>We are all in the same boat</p><p>It is time for us to help</p>",
    "id": null,
    "doc_id": null,
    "fave": "N",
    "system": "2015-09-24 13:00:17"
 }
}
<type 'unicode'>

I've also tried using ast and literal_eval() on json_text but then I get:

ValueError: malformed string

So. The scenario is that I have multiple json-files in a folder. I want to load these files and take specific keys and store them in a pandas DataFrame . I've tried pd.read_json() but it just tells me that there is something wrong with my json .

This is my code:

path_to_json = 'folder/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]

for index, js in enumerate(json_files):
    with open(os.path.join(path_to_json, js)) as json_file:
         json.load(json_file)

Gives ValueError: No JSON object could be decoded and therefor I've tried using json_file.read() et.c.

As I mentioned in the comments it will also cause a ValueError if the encoding is not ASCII based. For example the following json.loads fails:

>>> json.loads(u'{"id": null}'.encode("utf16"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

One way you could look at the encoding is to print(repr(json_text)) which could show additional bytes (like in UTF-16):

>>> print(repr(u'{"id": null}'.encode("utf16")))
'\xff\xfe{\x00"\x00i\x00d\x00"\x00:\x00 \x00n\x00u\x00l\x00l\x00}\x00'

The json.load and json.loads both support in Python 2 an encoding parameter. But that only applies to ASCII based encodings, so for UTF-16 you get the same ValueError :

>>> json.loads(u'{"id": null}'.encode("utf16"), encoding="utf16")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/json/__init__.py", line 352, in loads
    return cls(encoding=encoding, **kw).decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

If that is still the case (and you are sure that the issue it being encoded incorrectly) you can either decode the string manually:

json_text = json_text.decode("utf16")

Or you can load the file using codecs.open :

with codecs.open(json_file_name, "r", encoding="utf16") as f:
    print(json.load(f))
    # or
    # json_text = f.read()

(Note that I'm using UTF-16 here, but this might not be in the case for you)

And looking from your JSON text the characters itself are all ASCII characters, so any ASCII based encoding (eg latin-1) would still work without any decoding because there is no difference between that JSON content encoded in ASCII, UTF8 or latin-1.

As a side note you dumped the text and loaded it, and got a unicode object back. In theory (if my answer is correct) you should be able to actually load json_loads (aka json.loads(json_loads) ).

Not sure what's your error. I'm able to run as expected with this code:

import json

strdata = """
{
    "speech": {
        "text": "<p>Lords</p><p>We are all in the same boat</p><p>It is time for us to help</p>",
        "id": null,
        "doc_id": null,
        "fave": "N",
        "system": "2015-09-24 13:00:17"
    }
}
"""

data = json.loads(strdata)
print(data)

It seems to be a question of encoding, maybe already by reading from file. You should use the appropriate encoding as parameter in json.loads.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM