繁体   English   中英

将pandas数据框转换为聚合的嵌套json-like结构

[英]Convert pandas dataframe to aggregated nested json-like structure

目标 :将pandas数据框转换为类似json的聚合对象。

“类似于json”的对象包含每个Group和Category的值的权重(总和)。

当前状态:

df = pd.DataFrame({'group': ["Group 1", "Group 1", "Group 2", "Group 3", "Group 3", "Group 3"], 
                   'category': ["Category 1.1", "Category 1.2", "Category 2.1", "Category 3.1", "Category 3.2", "Category 3.3"],
                   'value': [2, 4, 5, 1, 4, 5]
                   })

结构体:

>>> df[['group','category','value']]
     group      category  value
0  Group 1  Category 1.1      2
1  Group 1  Category 1.2      4
2  Group 2  Category 2.1      5
3  Group 3  Category 3.1      1
4  Group 3  Category 3.2      4
5  Group 3  Category 3.3      5

所需的输出:

{"groups": [
    {"label": "Group 1",
      "weight": 6,
      "groups": [
        {"label": "Category 1.1",
          "weight": 2,
          "groups": [] },
        {"label": "Category 1.2",
          "weight": 4,
          "groups": [] }
      ] },
    {"label": "Group 2",
      "weight": 5,
      "groups": [{
          "label": "Category 2.1",
          "weight": 5,
          "groups": []
        } ] },
    {"label": "Group 3",
      "weight": 10,
      "groups": [{
          "label": "Category 3.1",
          "weight": 1,
          "groups": []
        },
        {"label": "Category 3.2",
          "weight": 4,
          "groups": []
        },
        {"label": "Category 3.3",
          "weight": 5,
          "groups": []
        } ]
    } ]
}

到目前为止已尝试:

pd.pivot_table(df, index=['group'],columns=['category'], values=['value'],aggfunc=np.sum, margins=True).stack('category')

枢轴输出:

                      value
group   category           
Group 1 All             6.0
        Category 1.1    2.0
        Category 1.2    4.0
Group 2 All             5.0
        Category 2.1    5.0
Group 3 All            10.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0
All     All            21.0
        Category 1.1    2.0
        Category 1.2    4.0
        Category 2.1    5.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0

从那里我被困住了。 汇总“全部”似乎应该放在另一列中,我不希望它作为“组”。 我已经尝试过将to_json()recordvalues ,以及作为args split各种迭代一起使用,但是我不知道如何呈现所需的输出。

还尝试了df.groupby(['group','category']).agg({'value':'sum'}) ,但是我没有得到汇总总和。

问题相似,但结构不尽相同:

我认为以下内容可能对您有用。 不能说这很...

import numpy as np
import pandas as pd
from itertools import chain
import json

df_grouped = df.groupby(['group', 'category'])['value'].sum().reset_index()
df_grouped = df_grouped.rename(columns={'value': 'weight', 'category': 'label'})

output_object = \
    [{'label': k, 
      'weight': df_grouped.loc[v, 'weight'].sum(),
      'groups': [dict({'groups': ()}.items() | x.items()) for x in 
                 chain.from_iterable(df_grouped.iloc[v, :].groupby('label')[['label', 'weight']].\
                  apply(lambda x: x.to_dict(orient='records')).tolist())]}
      for (k, v) in df_grouped.groupby(['group'])[['label', 'weight']].groups.items()]

output_dict = {'groups': output_object}

打印(output_dict)

{'groups': [{'groups': [{'groups': (), 'label': 'Category 2.1', 'weight': 5}],
   'label': 'Group 2',
   'weight': 5},
  {'groups': [{'groups': (), 'label': 'Category 1.1', 'weight': 2},
    {'groups': (), 'label': 'Category 1.2', 'weight': 4}],
   'label': 'Group 1',
   'weight': 6},
  {'groups': [{'groups': (), 'label': 'Category 3.1', 'weight': 1},
    {'groups': (), 'label': 'Category 3.2', 'weight': 4},
    {'groups': (), 'label': 'Category 3.3', 'weight': 5}],
   'label': 'Group 3',
   'weight': 10}]}

为了实际上以JSON形式获取它,我从以下答案中获取了解决方案:

def default(o):
    if isinstance(o, np.integer): return int(o)
    raise TypeError

output_json = json.dumps(output_json, default=default)

打印(output_json)

'{"groups": [{"groups": [{"groups": [], "weight": 5, "label": "Category 2.1"}], "weight": 5, "label": "Group 2"}, {"groups": [{"groups": [], "weight": 2, "label": "Category 1.1"}, {"groups": [], "weight": 4, "label": "Category 1.2"}], "weight": 6, "label": "Group 1"}, {"groups": [{"groups": [], "weight": 1, "label": "Category 3.1"}, {"groups": [], "weight": 4, "label": "Category 3.2"}, {"groups": [], "weight": 5, "label": "Category 3.3"}], "weight": 10, "label": "Group 3"}]}'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM