Pandas group by to json format

Question

I've a pandas dataframe, with columns ['region', 'country'... etc]. I want to get below JSON format from the df.

[{
      "total_count": 400,
      "geo": "Total",
      drill_through:[{
        "total_count": 180,
        "geo": "EURO",
        drill_through: [{
          "total_count": 100,
          "geo": "UK",
          drill_through: []
        }, {
          "total_count": 80,
          "geo": "Ireland",
          drill_through: []
        }]
      }, {
        "total_count": 130,
        "geo": "AMR",
        drill_through: [{
          "total_count": 20,
          "geo": "Mexico",
          drill_through: []
        }, {
          "total_count": 110,
          "geo": "California",
          drill_through: []
        }]
      }, {
        "total_count": 90,
        "geo": "APAC",
        drill_through: [{
          "total_count": 90,
          "geo": "Japan",
          drill_through: []
        }]
      }
    ]
 }]

Post group by I'm getting below DF:

df.groupby(['region', 'country']).size().reset_index(name='count')

Output of above group by:

region      country         count
EURO        UK              100
            Ireland         80
AMR         Mexico          20
            California      110
APAC        Japan           90

How can i achieve the JSON format above? Thank you in advance.

Answer 1

Attempting to answer this without the full context of your starting DataFrame structure. I've created a simple dataframe as a starting point and show how to get something like the nested structure you are looking for.

Note that I left out the "drill_through" element on the Country level, which you showed as being an empty array, because I'm not sure what you would be including there as children of the Country. But it should be trivial to add an empty array on each country element when creating those, if you really want that.

import pandas as pd
import json

df = pd.DataFrame(
    data=[
        ("EURO","UK",100),
        ("EURO","Ireland",80),
        ("AMR","Mexico",20),
        ("AMR","California",110),
        ("APAC","Japan",90)
    ], 
    columns=["region","country","total_count"]
)

#First get the regions into a list of dictionary objects
regions = df[["region", "total_count"]].groupby("region").sum()
regions["geo"] = regions.index.values
regions = regions.to_dict(orient="records") 

#now add the countries to each region dictionary
for region in regions:
    countries = df[df["region"] == region["geo"]].drop("region", axis=1).groupby("country").sum()
    countries["geo"] = countries.index.values
    region["drill_through"] = countries.to_dict(orient="records")
    
#Serialize the list of regions as JSON
json_str = json.dumps(regions)

print(json_str)

Output:

[
    {
        "total_count": 130,
        "geo": "AMR",
        "drill_through": [
            {
                "total_count": 110,
                "geo": "California"
            },
            {
                "total_count": 20,
                "geo": "Mexico"
            }
        ]
    },
    {
        "total_count": 90,
        "geo": "APAC",
        "drill_through": [
            {
                "total_count": 90,
                "geo": "Japan"
            }
        ]
    },
    {
        "total_count": 180,
        "geo": "EURO",
        "drill_through": [
            {
                "total_count": 80,
                "geo": "Ireland"
            },
            {
                "total_count": 100,
                "geo": "UK"
            }
        ]
    }
]

Pandas group by to json format

Question

1 answers

solution1
0 2021-12-27 02:19:26

Pandas group by to json format

Question

1 answers

solution1 0 2021-12-27 02:19:26

solution1
0 2021-12-27 02:19:26