简体   繁体   English

在 Python 中展开嵌套的 JSON

[英]Flatten Nested JSON in Python

I'm new to Python and I'm quite stuck (I've gone through multiple other stackoverflows and other sites and still can't get this to work).我是 Python 的新手,我很困惑(我已经浏览了多个其他 stackoverflows 和其他站点,但仍然无法正常工作)。

I've the below json coming out of an API connection我有以下 json 来自 API 连接

    {
   "results":[
      {
         "group":{
            "mediaType":"chat",
            "queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
         },
         "data":[
            {
               "interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
               "metrics":[
                  {
                     "metric":"nOffered",
                     "qualifier":null,
                     "stats":{
                        "max":null,
                        "min":null,
                        "count":14,
                        "count_negative":null,
                        "count_positive":null,
                        "sum":null,
                        "current":null,
                        "ratio":null,
                        "numerator":null,
                        "denominator":null,
                        "target":null
                     }
                  }
               ],
               "views":null
            }
         ]
      }
   ]
}

and what I'm mainly looking to get out of it is (or at least something as close as)而我主要想摆脱它的是(或至少接近于)

MediaType媒体类型 QueueId队列ID NOffered不提供
Chat聊天 67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d 67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d 14 14

Is something like that possible?这样的事情可能吗? I've tried multiple things and I either get the whole of this out in one line or just get different errors.我尝试了多种方法,但我要么在一行中完成所有操作,要么只是遇到不同的错误。

The error you got indicates you missed that some of your values are actually a dictionary within an array.您收到的错误表明您错过了某些值实际上是数组中的字典。

Assuming you want to flatten your json file to retrieve the following keys: mediaType , queueId , count .假设您想要展平您的 json 文件以检索以下键: mediaTypequeueIdcount

These can be retrieved by the following sample code:这些可以通过以下示例代码检索:

import json
with open(path_to_json_file, 'r') as f:
    json_dict = json.load(f)

for result in json_dict.get("results"):
    media_type = result.get("group").get("mediaType")
    queue_id = result.get("group").get("queueId")
    n_offered = result.get("data")[0].get("metrics")[0].get("count") 

If your data and metrics keys will have multiple indices you will have to use a for loop to retrieve every count value accordingly.如果您的datametrics键将有多个索引,您将不得不使用for循环来相应地检索每个count数值。

Assuming that the format of the API response is always the same, have you considered hardcoding the extraction of the data you want?假设 API 响应的格式始终相同,您是否考虑过硬编码您想要的数据的提取?

This should work: With response defined as the API output:这应该有效: response定义为 API output:

response =     {
   "results":[
      {
          "group":{
            "mediaType":"chat",
            "queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
          },
          "data":[
            {
               "interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
               "metrics":[
                  {
                     "metric":"nOffered",
                     "qualifier":'null',
                     "stats":{
                        "max":'null',
                        "min":'null',
                        "count":14,
                        "count_negative":'null',
                        "count_positive":'null',
                        "sum":'null',
                        "current":'null',
                        "ratio":'null',
                        "numerator":'null',
                        "denominator":'null',
                        "target":'null'
                     }
                  }
               ],
               "views":'null'
            }
         ]
      }
   ]
}

You can extract the results as follows:您可以按如下方式提取结果:

results = response["results"][0]

{
    "mediaType": results["group"]["mediaType"],
    "queueId": results["group"]["queueId"],
    "nOffered": results["data"][0]["metrics"][0]["stats"]["count"]
}

which gives这使

{
    'mediaType': 'chat',
    'queueId': '67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d',
    'nOffered': 14
}
import pandas as pd
tree=     {
   "results":[
      {
         "group":{
            "mediaType":"chat",
            "queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
         },
         "data":[
            {
               "interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
               "metrics":[
                  {
                     "metric":"nOffered",
                     "qualifier":"null",
                     "stats":{
                        "max":"null",
                        "min":"null",
                        "count":14,
                        "count_negative":"null",
                        "count_positive":"null",
                        "sum":"null",
                        "current":"null",
                        "ratio":"null",
                        "numerator":"null",
                        "denominator":"null",
                        "target":"null"
                     }
                  }
               ],
               "views":"null"
            }
         ]
      }
   ]
}


def traverse_parser_dfs(master_tree):
  flatten_tree_node = []
  def _process_leaves(tree:dict,prefix:str = "node", tree_node:dict = dict(), update:bool = True):
      is_nested = False
      if isinstance(tree,dict):
        for k in tree.keys():
            if type(tree[k]) == str:
                colName = prefix + "_" + k
                tree_node[colName] = tree[k]
            elif type(tree[k]) == dict:
                prefix += "_" + k
                leave = tree[k]
                _process_leaves(leave,prefix = prefix, tree_node = tree_node, update = False)
        for k in tree.keys():
            if type(tree[k]) == list:
                is_nested = True
                prefix += "_" + k
                for leave in tree[k]:
                    _process_leaves(leave,prefix = prefix, tree_node = tree_node.copy())
        if not is_nested and update:
            flatten_tree_node.append(tree_node)
        
  _process_leaves(master_tree)
  df = pd.DataFrame(flatten_tree_node)
  df.columns = df.columns.str.replace("@", "_")
  df.columns = df.columns.str.replace("#", "_")
  return df


print(traverse_parser_dfs(tree))


  node_results_group_mediaType  ... node_results_group_data_metrics_stats_target
0                         chat  ...                                         null

[1 rows x 16 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM