[英]Pandas group by to json format
I've a pandas dataframe, with columns ['region', 'country'... etc].我有一个 pandas dataframe,列 ['region', 'country'... etc]。 I want to get below JSON format from the df.
我想从 df 得到低于 JSON 格式。
[{
"total_count": 400,
"geo": "Total",
drill_through:[{
"total_count": 180,
"geo": "EURO",
drill_through: [{
"total_count": 100,
"geo": "UK",
drill_through: []
}, {
"total_count": 80,
"geo": "Ireland",
drill_through: []
}]
}, {
"total_count": 130,
"geo": "AMR",
drill_through: [{
"total_count": 20,
"geo": "Mexico",
drill_through: []
}, {
"total_count": 110,
"geo": "California",
drill_through: []
}]
}, {
"total_count": 90,
"geo": "APAC",
drill_through: [{
"total_count": 90,
"geo": "Japan",
drill_through: []
}]
}
]
}]
Post group by I'm getting below DF:我低于 DF 的帖子组:
df.groupby(['region', 'country']).size().reset_index(name='count')
Output of above group by:上述组的 Output 由:
region country count
EURO UK 100
Ireland 80
AMR Mexico 20
California 110
APAC Japan 90
How can i achieve the JSON format above?如何实现上面的 JSON 格式? Thank you in advance.
先感谢您。
Attempting to answer this without the full context of your starting DataFrame structure.试图在没有您起始 DataFrame 结构的完整上下文的情况下回答这个问题。 I've created a simple dataframe as a starting point and show how to get something like the nested structure you are looking for.
我创建了一个简单的 dataframe 作为起点,并展示了如何获得类似于您正在寻找的嵌套结构的东西。
Note that I left out the "drill_through" element on the Country level, which you showed as being an empty array, because I'm not sure what you would be including there as children of the Country.请注意,我在 Country 级别省略了“drill_through”元素,您将其显示为一个空数组,因为我不确定您将作为 Country 的子项包含哪些内容。 But it should be trivial to add an empty array on each country element when creating those, if you really want that.
但是,如果你真的想要的话,在创建它们时在每个国家/地区元素上添加一个空数组应该是微不足道的。
import pandas as pd
import json
df = pd.DataFrame(
data=[
("EURO","UK",100),
("EURO","Ireland",80),
("AMR","Mexico",20),
("AMR","California",110),
("APAC","Japan",90)
],
columns=["region","country","total_count"]
)
#First get the regions into a list of dictionary objects
regions = df[["region", "total_count"]].groupby("region").sum()
regions["geo"] = regions.index.values
regions = regions.to_dict(orient="records")
#now add the countries to each region dictionary
for region in regions:
countries = df[df["region"] == region["geo"]].drop("region", axis=1).groupby("country").sum()
countries["geo"] = countries.index.values
region["drill_through"] = countries.to_dict(orient="records")
#Serialize the list of regions as JSON
json_str = json.dumps(regions)
print(json_str)
Output: Output:
[
{
"total_count": 130,
"geo": "AMR",
"drill_through": [
{
"total_count": 110,
"geo": "California"
},
{
"total_count": 20,
"geo": "Mexico"
}
]
},
{
"total_count": 90,
"geo": "APAC",
"drill_through": [
{
"total_count": 90,
"geo": "Japan"
}
]
},
{
"total_count": 180,
"geo": "EURO",
"drill_through": [
{
"total_count": 80,
"geo": "Ireland"
},
{
"total_count": 100,
"geo": "UK"
}
]
}
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.