Python在JSON中解码嵌套的JSON

Question

I'm dealing with an API that unfortunately is returning malformed (or "weirdly formed," rather -- thanks @fjarri) JSON, but on the positive side I think it may be an opportunity for me to learn something about recursion as well as JSON. 我正在处理一个API，不幸的是，它返回格式错误（或“奇怪的形成”，而不是 - 感谢@fjarri）JSON，但从积极的一面来看，我认为这可能是我学习递归的机会以及JSON。 It's for an app I use to log my workouts, I'm trying to make a backup script. 这是我用来记录我的训练的应用程序，我正在尝试制作一个备份脚本。

I can received the JSON fine, but even after requests.get(api_url).json() (or json.loads(requests.get(api_url).text) ), one of the values is still a JSON encoded string. 我可以收到JSON，但即使在requests.get(api_url).json() （或json.loads(requests.get(api_url).text) ）之后，其中一个值仍然是JSON编码的字符串。 Luckily, I can just json.loads() the string and it properly decodes to a dict. 幸运的是，我可以只使用json.loads()字符串并将其正确解码为dict。 The specific key is predictable: timezone_id , whereas its value varies (because data has been logged in multiple timezones). 特定密钥是可预测的： timezone_id ，而其值会有所不同（因为数据已记录在多个时区中）。 For example, after decoding, it might be: dump ed to file as "timezone_id": {\\"name\\":\\"America/Denver\\",\\"seconds\\":\\"-21600\\"}" , or load ed into Python as 'timezone_id': '{"name":"America/Denver","seconds":"-21600"}' 例如，解码后，它可能是： dump到文件"timezone_id": {\\"name\\":\\"America/Denver\\",\\"seconds\\":\\"-21600\\"}" ，或者load到Python中的'timezone_id': '{"name":"America/Denver","seconds":"-21600"}'

The problem is that I'm using this API to retrieve a fair amount of data, which has several layers of dicts and lists, and the double encoded timezone_id s occur at multiple levels. 问题是我正在使用这个API来检索相当数量的数据，这些数据有几层dicts和列表，而双重编码的timezone_id出现在多个级别。

Here's my work so far with some example data, but it seems like I'm pretty far off base. 到目前为止，这是我的工作，有一些示例数据，但似乎我离基地很远。

#! /usr/bin/env python3

import json
from pprint import pprint

my_input = r"""{
    "hasMore": false,
    "checkins": [
        {
            "timestamp": 1353193745000,
            "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
            "privacy_groups": [
                "private"
            ],
            "meta": {
                "client_version": "3.0",
                "uuid": "fake_UUID"
            },
            "client_id": "fake_client_id",
            "workout_name": "Workout (Nov 17, 2012)",
            "fitness_workout_json": {
                "exercise_logs": [
                    {
                        "timestamp": 1353195716000,
                        "type": "exercise_log",
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    },
                    {
                        "timestamp": 1353195340000,
                        "type": "exercise_log",
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    }
                ]
            },
            "workout_uuid": ""
        },
        {
            "timestamp": 1354485615000,
            "user_id": "fake_ID",
            "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
            "privacy_groups": [
                "private"
            ],
            "meta": {
                "uuid": "fake_UUID"
            },
            "created": 1372023457376,
            "workout_name": "Workout (Dec 02, 2012)",
            "fitness_workout_json": {
                "exercise_logs": [
                    {
                        "timestamp": 1354485615000,
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    },
                    {
                        "timestamp": 1354485584000,
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    }
                ]
            },
            "workout_uuid": ""
        }]}"""

def recurse(obj):
    if isinstance(obj, list):
        for item in obj:
            return recurse(item)
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, str):
                try:
                    v = json.loads(v)
                except ValueError:
                    pass
                obj.update({k: v})
            elif isinstance(v, (dict, list)):
                return recurse(v)

pprint(json.loads(my_input, object_hook=recurse))

Any suggestions for a good way to json.loads() all those double-encoded values without changing the rest of the object? 有关json.loads()所有这些双重编码值的好方法的任何建议，而不更改对象的其余部分？ Many thanks in advance! 提前谢谢了！

This post seems to be a good reference: Modifying Deeply-Nested Structures 这篇文章似乎是一个很好的参考：修改深层嵌套结构

Edit: This was flagged as a possible duplicate of this question -- I think its fairly different, as I've already demonstrated that using json.loads() was not working. 编辑：这被标记为这个问题的可能重复 - 我认为它相当不同，因为我已经证明使用json.loads()不起作用。 The solution ended up requiring an object_hook , which I've never had to use when decoding json and is not addressed in the prior question. 解决方案最终需要一个object_hook ，我在解码json时从未使用过，而在前面的问题中没有解决。

Answer 1

So, the object_hook in the json loader is going to be called each time the json loader is finished constructing a dictionary. 因此，每次json加载器构建完字典时，都会调用json加载器中的object_hook 。 That is, the first thing it is called on is the inner-most dictionary, working outwards. 也就是说，它被调用的第一件事是最里面的字典，向外工作。

The dictionary that the object_hook callback is given is replaced by what that function returns. 给出object_hook回调的字典将替换为该函数返回的内容。

So, you don't need to recurse yourself. 所以，你不需要自己报复。 The loader is giving you access to the inner-most things first by its nature. 装载机使您可以首先访问最内层的东西。

I think this will work for you: 我认为这对你有用：

def hook(obj):
    value = obj.get("timezone_id")
    # this is python 3 specific; I would check isinstance against 
    # basestring in python 2
    if value and isinstance(value, str):
        obj["timezone_id"] = json.loads(value, object_hook=hook)
    return obj
data = json.loads(my_input, object_hook=hook)

It seems to have the effect I think you're looking for when I test it. 当我测试它时，它似乎具有我认为你正在寻找的效果。

I probably wouldn't try to decode every string value -- I would strategically just call it where you expect there to be a json object double encoding to exist. 我可能不会尝试解码每个字符串值 - 我会战略性地调用它，你希望存在一个json对象双重编码。 If you try to decode every string, you might accidentally decode something that is supposed to be a string (like the string "12345" when that is intended to be a string returned by the API). 如果你尝试解码每个字符串，你可能会意外地解码一些应该是字符串的东西（比如字符串"12345"当它打算成为API返回的字符串时）。

Also, your existing function is more complicated than it needs to be, might work as-is if you always returned obj (whether you update its contents or not). 此外，您现有的函数比它需要的更复杂，如果您总是返回obj （无论您是否更新其内容），都可以按原样运行。

Answer 2

Your main issue is that your object_hook function should not be recursing. 您的主要问题是您的object_hook函数不应该object_hook 。 json.loads() takes care of the recursing itself and calls your function every time it finds a dictionary (aka obj will always be a dictionary). json.loads()负责json.loads()本身并在每次找到字典时调用你的函数（aka obj将始终是字典）。 So instead you just want to modify the problematic keys and return the dict -- this should do what you are looking for: 所以你只需要修改有问题的键并返回dict - 这应该做你想要的：

def flatten_hook(obj):
    for key, value in obj.iteritems():
        if isinstance(value, basestring):
            try:
                obj[key] = json.loads(value, object_hook=flatten_hook)
            except ValueError:
                pass
    return obj

pprint(json.loads(my_input, object_hook=flatten_hook))

However, if you know the problematic (double-encoded) entry always take on a specific form (eg key == 'timezone_id' ) it is probably safer to just call json.loads() on those keys only, as Matt Anderson suggests in his answer. 但是，如果您知道有问题的（双重编码的）条目总是采用特定的形式（例如key == 'timezone_id' ），那么仅在这些键上调用json.loads()可能更安全，正如Matt Anderson所建议的那样他的回答。

Python在JSON中解码嵌套的JSON

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-09-04 02:10:48

解决方案2
1 2015-09-04 02:19:27

Python在JSON中解码嵌套的JSON

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-09-04 02:10:48

解决方案2 1 2015-09-04 02:19:27

解决方案1
4 已采纳 2015-09-04 02:10:48

解决方案2
1 2015-09-04 02:19:27