[英]Convert pandas dataframe to aggregated nested json-like structure
目標 :將pandas數據框轉換為類似json的聚合對象。
“類似於json”的對象包含每個Group和Category的值的權重(總和)。
當前狀態:
df = pd.DataFrame({'group': ["Group 1", "Group 1", "Group 2", "Group 3", "Group 3", "Group 3"],
'category': ["Category 1.1", "Category 1.2", "Category 2.1", "Category 3.1", "Category 3.2", "Category 3.3"],
'value': [2, 4, 5, 1, 4, 5]
})
結構體:
>>> df[['group','category','value']]
group category value
0 Group 1 Category 1.1 2
1 Group 1 Category 1.2 4
2 Group 2 Category 2.1 5
3 Group 3 Category 3.1 1
4 Group 3 Category 3.2 4
5 Group 3 Category 3.3 5
所需的輸出:
{"groups": [
{"label": "Group 1",
"weight": 6,
"groups": [
{"label": "Category 1.1",
"weight": 2,
"groups": [] },
{"label": "Category 1.2",
"weight": 4,
"groups": [] }
] },
{"label": "Group 2",
"weight": 5,
"groups": [{
"label": "Category 2.1",
"weight": 5,
"groups": []
} ] },
{"label": "Group 3",
"weight": 10,
"groups": [{
"label": "Category 3.1",
"weight": 1,
"groups": []
},
{"label": "Category 3.2",
"weight": 4,
"groups": []
},
{"label": "Category 3.3",
"weight": 5,
"groups": []
} ]
} ]
}
到目前為止已嘗試:
pd.pivot_table(df, index=['group'],columns=['category'], values=['value'],aggfunc=np.sum, margins=True).stack('category')
樞軸輸出:
value
group category
Group 1 All 6.0
Category 1.1 2.0
Category 1.2 4.0
Group 2 All 5.0
Category 2.1 5.0
Group 3 All 10.0
Category 3.1 1.0
Category 3.2 4.0
Category 3.3 5.0
All All 21.0
Category 1.1 2.0
Category 1.2 4.0
Category 2.1 5.0
Category 3.1 1.0
Category 3.2 4.0
Category 3.3 5.0
從那里我被困住了。 匯總“全部”似乎應該放在另一列中,我不希望它作為“組”。 我已經嘗試過將to_json()
與record
, values
,以及作為args split
各種迭代一起使用,但是我不知道如何呈現所需的輸出。
還嘗試了df.groupby(['group','category']).agg({'value':'sum'})
,但是我沒有得到匯總總和。
問題相似,但結構不盡相同:
我認為以下內容可能對您有用。 不能說這很...
import numpy as np
import pandas as pd
from itertools import chain
import json
df_grouped = df.groupby(['group', 'category'])['value'].sum().reset_index()
df_grouped = df_grouped.rename(columns={'value': 'weight', 'category': 'label'})
output_object = \
[{'label': k,
'weight': df_grouped.loc[v, 'weight'].sum(),
'groups': [dict({'groups': ()}.items() | x.items()) for x in
chain.from_iterable(df_grouped.iloc[v, :].groupby('label')[['label', 'weight']].\
apply(lambda x: x.to_dict(orient='records')).tolist())]}
for (k, v) in df_grouped.groupby(['group'])[['label', 'weight']].groups.items()]
output_dict = {'groups': output_object}
打印(output_dict)
{'groups': [{'groups': [{'groups': (), 'label': 'Category 2.1', 'weight': 5}],
'label': 'Group 2',
'weight': 5},
{'groups': [{'groups': (), 'label': 'Category 1.1', 'weight': 2},
{'groups': (), 'label': 'Category 1.2', 'weight': 4}],
'label': 'Group 1',
'weight': 6},
{'groups': [{'groups': (), 'label': 'Category 3.1', 'weight': 1},
{'groups': (), 'label': 'Category 3.2', 'weight': 4},
{'groups': (), 'label': 'Category 3.3', 'weight': 5}],
'label': 'Group 3',
'weight': 10}]}
為了實際上以JSON形式獲取它,我從以下答案中獲取了解決方案:
def default(o):
if isinstance(o, np.integer): return int(o)
raise TypeError
output_json = json.dumps(output_json, default=default)
打印(output_json)
'{"groups": [{"groups": [{"groups": [], "weight": 5, "label": "Category 2.1"}], "weight": 5, "label": "Group 2"}, {"groups": [{"groups": [], "weight": 2, "label": "Category 1.1"}, {"groups": [], "weight": 4, "label": "Category 1.2"}], "weight": 6, "label": "Group 1"}, {"groups": [{"groups": [], "weight": 1, "label": "Category 3.1"}, {"groups": [], "weight": 4, "label": "Category 3.2"}, {"groups": [], "weight": 5, "label": "Category 3.3"}], "weight": 10, "label": "Group 3"}]}'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.