[英]Flatten Nested JSON in Python
I'm new to Python and I'm quite stuck (I've gone through multiple other stackoverflows and other sites and still can't get this to work).我是 Python 的新手,我很困惑(我已经浏览了多个其他 stackoverflows 和其他站点,但仍然无法正常工作)。
I've the below json coming out of an API connection我有以下 json 来自 API 连接
{
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":null,
"stats":{
"max":null,
"min":null,
"count":14,
"count_negative":null,
"count_positive":null,
"sum":null,
"current":null,
"ratio":null,
"numerator":null,
"denominator":null,
"target":null
}
}
],
"views":null
}
]
}
]
}
and what I'm mainly looking to get out of it is (or at least something as close as)而我主要想摆脱它的是(或至少接近于)
MediaType![]() |
QueueId![]() |
NOffered![]() |
---|---|---|
Chat![]() |
67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d ![]() |
14 ![]() |
Is something like that possible?这样的事情可能吗? I've tried multiple things and I either get the whole of this out in one line or just get different errors.
我尝试了多种方法,但我要么在一行中完成所有操作,要么只是遇到不同的错误。
The error you got indicates you missed that some of your values are actually a dictionary within an array.您收到的错误表明您错过了某些值实际上是数组中的字典。
Assuming you want to flatten your json file to retrieve the following keys: mediaType
, queueId
, count
.假设您想要展平您的 json 文件以检索以下键:
mediaType
、 queueId
、 count
。
These can be retrieved by the following sample code:这些可以通过以下示例代码检索:
import json
with open(path_to_json_file, 'r') as f:
json_dict = json.load(f)
for result in json_dict.get("results"):
media_type = result.get("group").get("mediaType")
queue_id = result.get("group").get("queueId")
n_offered = result.get("data")[0].get("metrics")[0].get("count")
If your data
and metrics
keys will have multiple indices you will have to use a for
loop to retrieve every count
value accordingly.如果您的
data
和metrics
键将有多个索引,您将不得不使用for
循环来相应地检索每个count
数值。
Assuming that the format of the API response is always the same, have you considered hardcoding the extraction of the data you want?假设 API 响应的格式始终相同,您是否考虑过硬编码您想要的数据的提取?
This should work: With response
defined as the API output:这应该有效:
response
定义为 API output:
response = {
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":'null',
"stats":{
"max":'null',
"min":'null',
"count":14,
"count_negative":'null',
"count_positive":'null',
"sum":'null',
"current":'null',
"ratio":'null',
"numerator":'null',
"denominator":'null',
"target":'null'
}
}
],
"views":'null'
}
]
}
]
}
You can extract the results as follows:您可以按如下方式提取结果:
results = response["results"][0]
{
"mediaType": results["group"]["mediaType"],
"queueId": results["group"]["queueId"],
"nOffered": results["data"][0]["metrics"][0]["stats"]["count"]
}
which gives这使
{
'mediaType': 'chat',
'queueId': '67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d',
'nOffered': 14
}
import pandas as pd
tree= {
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":"null",
"stats":{
"max":"null",
"min":"null",
"count":14,
"count_negative":"null",
"count_positive":"null",
"sum":"null",
"current":"null",
"ratio":"null",
"numerator":"null",
"denominator":"null",
"target":"null"
}
}
],
"views":"null"
}
]
}
]
}
def traverse_parser_dfs(master_tree):
flatten_tree_node = []
def _process_leaves(tree:dict,prefix:str = "node", tree_node:dict = dict(), update:bool = True):
is_nested = False
if isinstance(tree,dict):
for k in tree.keys():
if type(tree[k]) == str:
colName = prefix + "_" + k
tree_node[colName] = tree[k]
elif type(tree[k]) == dict:
prefix += "_" + k
leave = tree[k]
_process_leaves(leave,prefix = prefix, tree_node = tree_node, update = False)
for k in tree.keys():
if type(tree[k]) == list:
is_nested = True
prefix += "_" + k
for leave in tree[k]:
_process_leaves(leave,prefix = prefix, tree_node = tree_node.copy())
if not is_nested and update:
flatten_tree_node.append(tree_node)
_process_leaves(master_tree)
df = pd.DataFrame(flatten_tree_node)
df.columns = df.columns.str.replace("@", "_")
df.columns = df.columns.str.replace("#", "_")
return df
print(traverse_parser_dfs(tree))
node_results_group_mediaType ... node_results_group_data_metrics_stats_target
0 chat ... null
[1 rows x 16 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.