简体   繁体   中英

Pandas group by to json format

I've a pandas dataframe, with columns ['region', 'country'... etc]. I want to get below JSON format from the df.

[{
      "total_count": 400,
      "geo": "Total",
      drill_through:[{
        "total_count": 180,
        "geo": "EURO",
        drill_through: [{
          "total_count": 100,
          "geo": "UK",
          drill_through: []
        }, {
          "total_count": 80,
          "geo": "Ireland",
          drill_through: []
        }]
      }, {
        "total_count": 130,
        "geo": "AMR",
        drill_through: [{
          "total_count": 20,
          "geo": "Mexico",
          drill_through: []
        }, {
          "total_count": 110,
          "geo": "California",
          drill_through: []
        }]
      }, {
        "total_count": 90,
        "geo": "APAC",
        drill_through: [{
          "total_count": 90,
          "geo": "Japan",
          drill_through: []
        }]
      }
    ]
 }]

Post group by I'm getting below DF:

df.groupby(['region', 'country']).size().reset_index(name='count')

Output of above group by:

region      country         count
EURO        UK              100
            Ireland         80
AMR         Mexico          20
            California      110
APAC        Japan           90

How can i achieve the JSON format above? Thank you in advance.

Attempting to answer this without the full context of your starting DataFrame structure. I've created a simple dataframe as a starting point and show how to get something like the nested structure you are looking for.

Note that I left out the "drill_through" element on the Country level, which you showed as being an empty array, because I'm not sure what you would be including there as children of the Country. But it should be trivial to add an empty array on each country element when creating those, if you really want that.

import pandas as pd
import json

df = pd.DataFrame(
    data=[
        ("EURO","UK",100),
        ("EURO","Ireland",80),
        ("AMR","Mexico",20),
        ("AMR","California",110),
        ("APAC","Japan",90)
    ], 
    columns=["region","country","total_count"]
)

#First get the regions into a list of dictionary objects
regions = df[["region", "total_count"]].groupby("region").sum()
regions["geo"] = regions.index.values
regions = regions.to_dict(orient="records") 

#now add the countries to each region dictionary
for region in regions:
    countries = df[df["region"] == region["geo"]].drop("region", axis=1).groupby("country").sum()
    countries["geo"] = countries.index.values
    region["drill_through"] = countries.to_dict(orient="records")
    
#Serialize the list of regions as JSON
json_str = json.dumps(regions)

print(json_str)

Output:

[
    {
        "total_count": 130,
        "geo": "AMR",
        "drill_through": [
            {
                "total_count": 110,
                "geo": "California"
            },
            {
                "total_count": 20,
                "geo": "Mexico"
            }
        ]
    },
    {
        "total_count": 90,
        "geo": "APAC",
        "drill_through": [
            {
                "total_count": 90,
                "geo": "Japan"
            }
        ]
    },
    {
        "total_count": 180,
        "geo": "EURO",
        "drill_through": [
            {
                "total_count": 80,
                "geo": "Ireland"
            },
            {
                "total_count": 100,
                "geo": "UK"
            }
        ]
    }
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM