简体   繁体   English

Pandas 按 json 格式分组

[英]Pandas group by to json format

I've a pandas dataframe, with columns ['region', 'country'... etc].我有一个 pandas dataframe,列 ['region', 'country'... etc]。 I want to get below JSON format from the df.我想从 df 得到低于 JSON 格式。

[{
      "total_count": 400,
      "geo": "Total",
      drill_through:[{
        "total_count": 180,
        "geo": "EURO",
        drill_through: [{
          "total_count": 100,
          "geo": "UK",
          drill_through: []
        }, {
          "total_count": 80,
          "geo": "Ireland",
          drill_through: []
        }]
      }, {
        "total_count": 130,
        "geo": "AMR",
        drill_through: [{
          "total_count": 20,
          "geo": "Mexico",
          drill_through: []
        }, {
          "total_count": 110,
          "geo": "California",
          drill_through: []
        }]
      }, {
        "total_count": 90,
        "geo": "APAC",
        drill_through: [{
          "total_count": 90,
          "geo": "Japan",
          drill_through: []
        }]
      }
    ]
 }]

Post group by I'm getting below DF:我低于 DF 的帖子组:

df.groupby(['region', 'country']).size().reset_index(name='count')

Output of above group by:上述组的 Output 由:

region      country         count
EURO        UK              100
            Ireland         80
AMR         Mexico          20
            California      110
APAC        Japan           90

How can i achieve the JSON format above?如何实现上面的 JSON 格式? Thank you in advance.先感谢您。

Attempting to answer this without the full context of your starting DataFrame structure.试图在没有您起始 DataFrame 结构的完整上下文的情况下回答这个问题。 I've created a simple dataframe as a starting point and show how to get something like the nested structure you are looking for.我创建了一个简单的 dataframe 作为起点,并展示了如何获得类似于您正在寻找的嵌套结构的东西。

Note that I left out the "drill_through" element on the Country level, which you showed as being an empty array, because I'm not sure what you would be including there as children of the Country.请注意,我在 Country 级别省略了“drill_through”元素,您将其显示为一个空数组,因为我不确定您将作为 Country 的子项包含哪些内容。 But it should be trivial to add an empty array on each country element when creating those, if you really want that.但是,如果你真的想要的话,在创建它们时在每个国家/地区元素上添加一个空数组应该是微不足道的。

import pandas as pd
import json

df = pd.DataFrame(
    data=[
        ("EURO","UK",100),
        ("EURO","Ireland",80),
        ("AMR","Mexico",20),
        ("AMR","California",110),
        ("APAC","Japan",90)
    ], 
    columns=["region","country","total_count"]
)

#First get the regions into a list of dictionary objects
regions = df[["region", "total_count"]].groupby("region").sum()
regions["geo"] = regions.index.values
regions = regions.to_dict(orient="records") 

#now add the countries to each region dictionary
for region in regions:
    countries = df[df["region"] == region["geo"]].drop("region", axis=1).groupby("country").sum()
    countries["geo"] = countries.index.values
    region["drill_through"] = countries.to_dict(orient="records")
    
#Serialize the list of regions as JSON
json_str = json.dumps(regions)

print(json_str)

Output: Output:

[
    {
        "total_count": 130,
        "geo": "AMR",
        "drill_through": [
            {
                "total_count": 110,
                "geo": "California"
            },
            {
                "total_count": 20,
                "geo": "Mexico"
            }
        ]
    },
    {
        "total_count": 90,
        "geo": "APAC",
        "drill_through": [
            {
                "total_count": 90,
                "geo": "Japan"
            }
        ]
    },
    {
        "total_count": 180,
        "geo": "EURO",
        "drill_through": [
            {
                "total_count": 80,
                "geo": "Ireland"
            },
            {
                "total_count": 100,
                "geo": "UK"
            }
        ]
    }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM