简体   繁体   中英

Convert pandas dataframe to aggregated nested json-like structure

Objective : convert pandas dataframe to an aggregated json-like object.

The "json-like" object contains an aggregate (sum) of the values for each Group and Category as weights.

Current state:

df = pd.DataFrame({'group': ["Group 1", "Group 1", "Group 2", "Group 3", "Group 3", "Group 3"], 
                   'category': ["Category 1.1", "Category 1.2", "Category 2.1", "Category 3.1", "Category 3.2", "Category 3.3"],
                   'value': [2, 4, 5, 1, 4, 5]
                   })

Structure:

>>> df[['group','category','value']]
     group      category  value
0  Group 1  Category 1.1      2
1  Group 1  Category 1.2      4
2  Group 2  Category 2.1      5
3  Group 3  Category 3.1      1
4  Group 3  Category 3.2      4
5  Group 3  Category 3.3      5

Desired output:

{"groups": [
    {"label": "Group 1",
      "weight": 6,
      "groups": [
        {"label": "Category 1.1",
          "weight": 2,
          "groups": [] },
        {"label": "Category 1.2",
          "weight": 4,
          "groups": [] }
      ] },
    {"label": "Group 2",
      "weight": 5,
      "groups": [{
          "label": "Category 2.1",
          "weight": 5,
          "groups": []
        } ] },
    {"label": "Group 3",
      "weight": 10,
      "groups": [{
          "label": "Category 3.1",
          "weight": 1,
          "groups": []
        },
        {"label": "Category 3.2",
          "weight": 4,
          "groups": []
        },
        {"label": "Category 3.3",
          "weight": 5,
          "groups": []
        } ]
    } ]
}

Tried so far:

pd.pivot_table(df, index=['group'],columns=['category'], values=['value'],aggfunc=np.sum, margins=True).stack('category')

Pivot output:

                      value
group   category           
Group 1 All             6.0
        Category 1.1    2.0
        Category 1.2    4.0
Group 2 All             5.0
        Category 2.1    5.0
Group 3 All            10.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0
All     All            21.0
        Category 1.1    2.0
        Category 1.2    4.0
        Category 2.1    5.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0

From there I'm stuck. The rollup "All" seems like it should be in another column and I don't want it as a "group". I've tried using to_json() with various iterations of record , values , split as args, but I can't figure out how to render the desired output.

Also tried df.groupby(['group','category']).agg({'value':'sum'}) , but I don't get the rollup sum.

Questions similar, but not quite the structure I'm seeking:

I think the below might work for you. Can't say it's pretty tho...

import numpy as np
import pandas as pd
from itertools import chain
import json

df_grouped = df.groupby(['group', 'category'])['value'].sum().reset_index()
df_grouped = df_grouped.rename(columns={'value': 'weight', 'category': 'label'})

output_object = \
    [{'label': k, 
      'weight': df_grouped.loc[v, 'weight'].sum(),
      'groups': [dict({'groups': ()}.items() | x.items()) for x in 
                 chain.from_iterable(df_grouped.iloc[v, :].groupby('label')[['label', 'weight']].\
                  apply(lambda x: x.to_dict(orient='records')).tolist())]}
      for (k, v) in df_grouped.groupby(['group'])[['label', 'weight']].groups.items()]

output_dict = {'groups': output_object}

print(output_dict)

{'groups': [{'groups': [{'groups': (), 'label': 'Category 2.1', 'weight': 5}],
   'label': 'Group 2',
   'weight': 5},
  {'groups': [{'groups': (), 'label': 'Category 1.1', 'weight': 2},
    {'groups': (), 'label': 'Category 1.2', 'weight': 4}],
   'label': 'Group 1',
   'weight': 6},
  {'groups': [{'groups': (), 'label': 'Category 3.1', 'weight': 1},
    {'groups': (), 'label': 'Category 3.2', 'weight': 4},
    {'groups': (), 'label': 'Category 3.3', 'weight': 5}],
   'label': 'Group 3',
   'weight': 10}]}

To actually get it in JSON form, I grabbed the solution from this answer :

def default(o):
    if isinstance(o, np.integer): return int(o)
    raise TypeError

output_json = json.dumps(output_json, default=default)

print(output_json)

'{"groups": [{"groups": [{"groups": [], "weight": 5, "label": "Category 2.1"}], "weight": 5, "label": "Group 2"}, {"groups": [{"groups": [], "weight": 2, "label": "Category 1.1"}, {"groups": [], "weight": 4, "label": "Category 1.2"}], "weight": 6, "label": "Group 1"}, {"groups": [{"groups": [], "weight": 1, "label": "Category 3.1"}, {"groups": [], "weight": 4, "label": "Category 3.2"}, {"groups": [], "weight": 5, "label": "Category 3.3"}], "weight": 10, "label": "Group 3"}]}'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM