[英]How to take a multi-indexed DataFrame to a nested dictionary structure?
How can you make a dataframe that has a multi-index and make it into a nice nested dictionary?如何制作具有多索引的数据框并将其制作成漂亮的嵌套字典?
Here's what I've tried so far, and it's close, however, the keys are tuples.这是我到目前为止尝试过的,并且很接近,但是,键是元组。 Looking to break those out into more dictionary keys.
希望将它们分解为更多的字典键。
What I've Tried:我试过的:
that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
'Type':['100','4','7','101','100','100','4','7'],
'time':[np.linspace(0,10,2) for i in range(8)]}
nn = pd.DataFrame(that)
nn = nn.set_index(['Food','Color','Type'])
vv = {}
for idx in nn.index:
vv[idx] = nn.loc[idx]
vv
Out[1]:
{('Apple', 'Red', '100'): time [0.0, 10.0]
Name: (Apple, Red, 100), dtype: object,
('Apple', 'Green', '4'): time [0.0, 10.0]
Name: (Apple, Green, 4), dtype: object,
('Apple', 'Yellow', '7'): time [0.0, 10.0]
Name: (Apple, Yellow, 7), dtype: object,
('Apple', 'Red', '101'): time [0.0, 10.0]
Name: (Apple, Red, 101), dtype: object,
('Banana', 'Red', '100'): time [0.0, 10.0]
Name: (Banana, Red, 100), dtype: object,
('Banana', 'Green', '100'): time [0.0, 10.0]
Name: (Banana, Green, 100), dtype: object,
('Orange', 'Green', '4'): time [0.0, 10.0]
Name: (Orange, Green, 4), dtype: object,
('Orange', 'Yellow', '7'): time [0.0, 10.0]
Name: (Orange, Yellow, 7), dtype: object}
What I want the output to look like.我希望输出的样子。
vv = {'Apple':{'Red':{'100':[0,10],'101':[0,10]},
'Green':{'4':[0,10]},
'Yellow':{'7':[0,10]}},
'Banana':{'Red':{'100':[0,10]},
'Green':{'100':[0,10]}}
'Orange':{'Green':{'4':[0,10]},
'Yellow':{'7':[0,10]}}}
Edit: Changed the range back to 8... was a typo, and changed number of points in linspace to just be 2 points for simplicity to reflect the example.编辑:将范围改回 8 ... 是一个错字,为了简单起见,将 linspace 中的点数更改为 2 点以反映示例。
Edit 2: Looking for a general way to do this.编辑 2:寻找一种通用的方法来做到这一点。 In particular, a colleague of mine has written a treeView model in pyqt that accepts a nested dictionary for the tree.
特别是,我的一位同事在 pyqt 中编写了一个 treeView 模型,该模型接受树的嵌套字典。 I just want to be able to take the dataframes that I have created to be quickly transformed into the format needed.
我只是希望能够将我创建的数据帧快速转换为所需的格式。
For those curious on how to do this in general, here you go.对于那些对一般如何做到这一点感到好奇的人,你去吧。 Nice little function I wrote.
我写的不错的小函数。 Works more for what I need.
更适合我的需要。
that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
'Type':['100','4','7','101','100','100','4','7'],
'time':[np.linspace(0,10,2) for i in range(8)]}
x = pd.DataFrame(that)
def NestedDict_fromDF(iDF,keyorder,values):
if not isinstance(keyorder,list):
keyorder = [keyorder]
if not isinstance(values,list):
values = [values]
for i in reversed(range(len(keyorder))):
if keyorder[i] not in iDF:
keyorder.pop(i)
for i in reversed(range(len(values))):
if values[i] not in iDF:
values.pop(i)
rdict = {}
if keyorder:
ndf = iDF.set_index(keyorder)
def makeDict(basedict,group):
for k,g in group:
basedict[k] = {}
try:
makeDict(basedict[k], g.droplevel(0).groupby(level=0))
except:
if values:
basedict[k] = g[values].reset_index(drop=True)
else:
basedict[k] = []
return basedict
rdict = makeDict({}, ndf.groupby(level=0))
return rdict
yy = NestedDict_fromDF(x,['Food','Color','Type','Integer'],['time'])
{'Apple': {'Green': {'4':DataFrame},
'Red': {'100':DataFrame,
'101':DataFrame},
'Yellow': {'7':DataFrame}},
'Banana': {'Green': {'100':DataFrame},
'Red': {'100':DataFrame}},
'Orange': {'Green': {'4':DataFrame},
'Yellow': {'7':DataFrame}}}
It grew too complex, too fast:它变得太复杂、太快:
from pprint import pprint
import pandas as pd
that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
'Type':['100','4','7','101','100','100','4','7'],
'time':[np.linspace(0,10,2) for i in range(8)]}
nn = pd.DataFrame(that)
df = nn.groupby(['Food', 'Color', 'Type']).agg(list)
d = {}
new_df = df.groupby(level=[0,1]).apply(lambda df:df.xs(df.name).to_dict()).to_dict() #[1]
for (food, color), v in new_df.items():
if not food in d:
d[food] = {color: {Type: time[0].tolist() for Type, time in v['time'].items()}}
else:
d[food][color] = {Type: time[0].tolist() for Type, time in v['time'].items()}
pprint(d)
Output:输出:
{'Apple': {'Green': {'4': [0.0, 10.0]},
'Red': {'100': [0.0, 10.0], '101': [0.0, 10.0]},
'Yellow': {'7': [0.0, 10.0]}},
'Banana': {'Green': {'100': [0.0, 10.0]}, 'Red': {'100': [0.0, 10.0]}},
'Orange': {'Green': {'4': [0.0, 10.0]}, 'Yellow': {'7': [0.0, 10.0]}}}
[1] taken from: DataFrame with MultiIndex to dict [1] 取自: DataFrame with MultiIndex to dict
Huh!哼! got it finally!
终于拿到了!
that = {'Food':['Apple','Apple','Apple','Apple','Banana','Banana','Orange','Orange'],
'Color':['Red','Green','Yellow','Red','Red','Green','Green','Yellow'],
'Type':['100','4','7','101','100','100','4','7'],
'time':[np.linspace(0,10,2) for i in range(8)]}
nn = pd.DataFrame(that)
nn = nn.set_index(['Food','Color','Type'])
group = nn.groupby(level=0)
d = {k: g.droplevel(0).groupby(level=0)
.apply(lambda df:df.xs(df.name)['time']
.apply(lambda x:x.tolist()).to_dict())
.to_dict() for k,g in group}
pprint(d)
{'Apple': {'Green': {'4': [0.0, 10.0]},
'Red': {'100': [0.0, 10.0], '101': [0.0, 10.0]},
'Yellow': {'7': [0.0, 10.0]}},
'Banana': {'Green': {'100': [0.0, 10.0]}, 'Red': {'100': [0.0, 10.0]}},
'Orange': {'Green': {'4': [0.0, 10.0]}, 'Yellow': {'7': [0.0, 10.0]}}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.