[英]Convert pandas dataframe to specific Json format in python
我有一個包含 2000 多行的大型數據集,我想將其轉換為特定的 Json 格式。 我已經在示例數據集上嘗試了此代碼。
我嘗試使用 to_json、to_dict 但它以通用格式提供輸出。
import pandas as pd
from collections import defaultdict
data = [['food', 'vegatables', 10], ['food', 'fruits', 5], ['food', 'pulses', 12], ['cloth', 'shirts',2], ['cloth', 'trousers', 6], ['books', 'notebook', 3], ['pens', 'roller', 4], ['pens', 'ball', 3]]
df = pd.DataFrame(data, columns = ['Items', 'Subitem', 'Quantity'])
labels = defaultdict(int)
labels1 = defaultdict(int)
for cat in df["Items"]:
labels[cat] += 1
for sub in df["Subitem"]:
labels1[sub] += 1
check = [{"item": i, "weight": labels[i], 'groups':[{"subitem":j, "weight": labels1[j], "group" : [] } for j in labels1] } for i in labels]
check
我得到這樣的輸出
[{'item': 'food',
'weight': 3,
'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
{'subitem': 'fruits', 'weight': 1, 'group': []},
{'subitem': 'pulses', 'weight': 1, 'group': []},
{'subitem': 'shirts', 'weight': 1, 'group': []},
{'subitem': 'trousers', 'weight': 1, 'group': []},
{'subitem': 'notebook', 'weight': 1, 'group': []},
{'subitem': 'roller', 'weight': 1, 'group': []},
{'subitem': 'ball', 'weight': 1, 'group': []}]},
{'item': 'cloth',
'weight': 2,
'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
{'subitem': 'fruits', 'weight': 1, 'group': []},
{'subitem': 'pulses', 'weight': 1, 'group': []},
{'subitem': 'shirts', 'weight': 1, 'group': []},
{'subitem': 'trousers', 'weight': 1, 'group': []},
{'subitem': 'notebook', 'weight': 1, 'group': []},
{'subitem': 'roller', 'weight': 1, 'group': []},
{'subitem': 'ball', 'weight': 1, 'group': []}]},
{'item': 'books',
'weight': 1,
'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
{'subitem': 'fruits', 'weight': 1, 'group': []},
{'subitem': 'pulses', 'weight': 1, 'group': []},
{'subitem': 'shirts', 'weight': 1, 'group': []},
{'subitem': 'trousers', 'weight': 1, 'group': []},
{'subitem': 'notebook', 'weight': 1, 'group': []},
{'subitem': 'roller', 'weight': 1, 'group': []},
{'subitem': 'ball', 'weight': 1, 'group': []}]},
{'item': 'pens',
'weight': 2,
'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
{'subitem': 'fruits', 'weight': 1, 'group': []},
{'subitem': 'pulses', 'weight': 1, 'group': []},
{'subitem': 'shirts', 'weight': 1, 'group': []},
{'subitem': 'trousers', 'weight': 1, 'group': []},
{'subitem': 'notebook', 'weight': 1, 'group': []},
{'subitem': 'roller', 'weight': 1, 'group': []},
{'subitem': 'ball', 'weight': 1, 'group': []}]}]
但我想要一個只有與該項目相關的子項目的輸出
[{'item': 'food',
'weight': 3,
'groups': [
{'subitem': 'vegatables', 'weight': 10, 'group': []},
{'subitem': 'fruits', 'weight': 5, 'group': []},
{'subitem': 'pulses', 'weight': 12, 'group': []}]},
{'item': 'cloth',
'weight': 2,
'groups': [
{'subitem': 'shirts', 'weight': 2, 'group': []},
{'subitem': 'trousers', 'weight': 6, 'group': []}]},
{'item': 'books',
'weight': 1,
'groups': [
{'subitem': 'notebook', 'weight': 3, 'group': []}]},
{'item': 'pens',
'weight': 2,
'groups': [
{'subitem': 'roller', 'weight': 4, 'group': []},
{'subitem': 'ball', 'weight': 3, 'group': []}]}]
如果想要這樣的輸出(其中項目的權重是子項目權重的累積),應該怎么做。
[{'item': 'food',
'weight': 27,
'groups': [
{'subitem': 'vegatables', 'weight': 10, 'group': []},
{'subitem': 'fruits', 'weight': 5, 'group': []},
{'subitem': 'pulses', 'weight': 12, 'group': []}]},
{'item': 'cloth',
'weight': 8,
'groups': [
{'subitem': 'shirts', 'weight': 2, 'group': []},
{'subitem': 'trousers', 'weight': 6, 'group': []}]},
{'item': 'books',
'weight': 3,
'groups': [
{'subitem': 'notebook', 'weight': 3, 'group': []}]},
{'item': 'pens',
'weight': 7,
'groups': [
{'subitem': 'roller', 'weight': 4, 'group': []},
{'subitem': 'ball', 'weight': 3, 'group': []}]}]
您可以將DataFrame.groupby
和DataFrame.to_dict
與list comprehension
cols_group = ['Subitem', 'Weight', 'group']
my_list = [{'Item' : item,
'Weight': len(group),
'group': group[cols_group].to_dict('records')}
for item, group in (df.rename(columns = {'Quantity' : 'Weight'})
.assign(group = [[]] * len(df))
.groupby('Items'))]
print(my_list)
輸出
[{'Item': 'books',
'Weight': 1,
'groups': [{'Subitem': 'notebook', 'Weight': 3, 'group': []}]},
{'Item': 'cloth',
'Weight': 2,
'groups': [{'Subitem': 'shirts', 'Weight': 2, 'group': []},
{'Subitem': 'trousers', 'Weight': 6, 'group': []}]},
{'Item': 'food',
'Weight': 3,
'groups': [{'Subitem': 'vegatables', 'Weight': 10, 'group': []},
{'Subitem': 'fruits', 'Weight': 5, 'group': []},
{'Subitem': 'pulses', 'Weight': 12, 'group': []}]},
{'Item': 'pens',
'Weight': 2,
'groups': [{'Subitem': 'roller', 'Weight': 4, 'group': []},
{'Subitem': 'ball', 'Weight': 3, 'group': []}]}]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.