![](/img/trans.png)
[英]Flatten nested JSON (has multiple list) into multiple pandas dataframe columns
[英]Get nested JSON from pandas dataframe grouped by multiple columns
我有一个 pandas dataframe:
d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'],
'date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-01', '2021-01-01','2021-01-02', '2021-01-02', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02'],
'class': [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2],
'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
'percent': [.8, .2, .5, .5, .75, .25, .8, .2, .7, .3, .8, .2]
}
df = pd.DataFrame(data=d)
df
key date class count percent
0 foo 2021-01-01 1 12 0.80
1 foo 2021-01-01 2 3 0.20
2 foo 2021-01-02 1 5 0.50
3 foo 2021-01-02 2 5 0.50
4 bar 2021-01-01 1 3 0.75
5 bar 2021-01-01 2 1 0.25
6 bar 2021-01-02 1 4 0.80
7 bar 2021-01-02 2 1 0.20
8 crow 2021-01-01 1 7 0.70
9 crow 2021-01-01 2 3 0.30
10 crow 2021-01-02 1 8 0.80
11 crow 2021-01-02 2 2 0.20
我想创建一个嵌套的 JSON 文件,该文件key
和date
分组,其中 count: 是一个列表,其中包含当天的key
计数和百分比:是包含 class 计数占总数百分比的列表(有每天需要一份包含每个班级百分比的列表)。
[
[
{
"key": "foo",
"count": [
15,
10
],
"predictions": [
[
.80,
.20
],
[
.50,
.50,
]
]
},
{
"key": "bar",
"count": [
4,
5
],
"predictions": [
[
.75,
.25
],
[
.80,
.20
]
]
},
{
"key": "crow",
"count": [
10,
10
],
"predictions": [
[
.70,
.30
],
[
.80,
.20
]
]
}
]
]
到目前为止,我有:
import json
dfj = dfd.groupby(["key","date"]).apply(lambda x: x.to_dict("r")).to_json(orient="records")
print(json.dumps(json.loads(dfj), indent=2, sort_keys=True))
返回:
[
[
{
"class": 1,
"count": 3,
"date": "2021-01-01",
"key": "bar",
"percent": 0.75
},
{
"class": 2,
"count": 1,
"date": "2021-01-01",
"key": "bar",
"percent": 0.25
}
],
[
{
"class": 1,
"count": 4,
"date": "2021-01-02",
"key": "bar",
"percent": 0.8
},
{
"class": 2,
"count": 1,
"date": "2021-01-02",
"key": "bar",
"percent": 0.2
}
],
[
{
"class": 1,
"count": 7,
"date": "2021-01-01",
"key": "crow",
"percent": 0.7
},
{
"class": 2,
"count": 3,
"date": "2021-01-01",
"key": "crow",
"percent": 0.3
}
],
[
{
"class": 1,
"count": 8,
"date": "2021-01-02",
"key": "crow",
"percent": 0.8
},
{
"class": 2,
"count": 2,
"date": "2021-01-02",
"key": "crow",
"percent": 0.2
}
],
[
{
"class": 1,
"count": 12,
"date": "2021-01-01",
"key": "foo",
"percent": 0.8
},
{
"class": 2,
"count": 3,
"date": "2021-01-01",
"key": "foo",
"percent": 0.2
}
],
[
{
"class": 1,
"count": 5,
"date": "2021-01-02",
"key": "foo",
"percent": 0.5
},
{
"class": 2,
"count": 5,
"date": "2021-01-02",
"key": "foo",
"percent": 0.5
}
]
]
任何帮助,将不胜感激。 谢谢你。
您可以使用:
d = {'count': ('count', 'sum'), 'predictions': ('percent', list)}
g = df.groupby(['key', 'date']).agg(**d).groupby(level=0).agg(list)
dct = [{'key': k, **v} for k, v in g.to_dict('i').items()]
细节:
groupby
给定的 dataframe 上的key
和date
和agg
使用字典d
,
groupby
来自第 1 步 on level=0
的聚合帧和agg
使用list
最后使用to_dict
和orient=index
将步骤 2 中的帧转换为字典,然后使用 dict 推导在字典中添加key
变量。
结果:
[{'key': 'bar', 'count': [4, 5], 'predictions': [[0.75, 0.25], [0.8, 0.2]]},
{'key': 'crow', 'count': [10, 10], 'predictions': [[0.7, 0.3], [0.8, 0.2]]},
{'key': 'foo', 'count': [15, 10], 'predictions': [[0.8, 0.2], [0.5, 0.5]]}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.