简体   繁体   中英

Efficient way to create nested dictionary using pandas dataframe colums as keys

I have a data frame like:

df = pd.DataFrame({'Geography': ['Geog1', 'Geog1', 'Geog1', 'Geog1','Geog2', 'Geog2','Geog2', 'Geog2'],
               'Goal': ['G1', 'G1', 'G2', 'G2','G1', 'G1', 'G2', 'G2'],
               'Indicator': ['G1I1', 'G1I2', 'G2I1', 'G2I2','G1I1', 'G1I2', 'G2I1', 'G2I2'],
               'Year': [2016, 2016, 2016, 2016,2016, 2016, 2016, 2016]
               'Data': [3, 5, 2, 6,7, 4, 6, 6]})

and I want to convert it to a nested dictionary like:

[{'Geography': Geog1, 'Info': [{'Goal': 'G1 ', 'Indicators': [{'Indicator': 'G1I1', 'dataYears': [{'Year': 2016, 'Data': 3}]}, {'Indicator': 'G1I2', 'dataYears': [{'Year': 2016, 'Data': 15.0, }, {'Year': 2011, 'Data': 21.0}]....

I've managed to do this with the following (highly inefficient code):

j = (df.groupby(['Geography','Goal','Indicator'])
     .apply(lambda x: x[['Year','Data']].to_dict('r'))
     .reset_index()
     .rename(columns={0:'dataYear'}))
j = (j.groupby(['Geography','Goal'])
     .apply(lambda x: x[['Indicator','dataYear']].to_dict('r'))
     .reset_index()
     .rename(columns={0:'Indicators'}))
j = (j.groupby(['Geography'])
     .apply(lambda x: x[['Goal','Indicators']].to_dict('r'))
     .reset_index()
     .rename(columns={0:'Goals'})
     .to_dict('r'))

My question is: does anyone know a way to do this more efficiently? I have seen answers elsewhere but they typically create a new nested level for each new column but I want to include multiple columns in some levels of the dictionary (eg, Year, data).

您可以通过以下方式轻松修复它:

x = [df.to_dict()] #create a list x whose content is the dictionary of you dataframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM