简体   繁体   中英

Import from a dictionary as a multi-index pd.DataFrame

I have a dictionary, which calls for multi-indexing like the following:

dict = {'Main1' : {'A1' : {'a1' : 0}, 
                   'A2' : {'a2' : 15}, 
                   'A3' : {'a3' : 22}, 
                   'A4' : {'a4' : 130}},
        'Main2' : {'B1' : {'b1' : 150},
                   'B2' : {'b2' : 30},
                   'B3' : {'b3' : 1}}}

And I would like to have it imported on Python as a pandas DataFrame like this :

col1     col2   col3   col4
Main 1   A1     a1     0
Main 1   A2     a2     15
Main 1   A3     a3     22
Main 1   A4     a4     130
Main 2   B1     b1     150
Main 2   B2     b2     30
Main 2   B3     b3     1

Is that even possible or should I try to find another way to import my data ?

You can do it so:

df = pd.DataFrame([(k1, k2, k3, v) for k1, k23v in dict.items()
                       for k2, k3v in k23v.items()
                       for k3, v in k3v.items()
                       ])
df.columns = ['Col1', 'Col2', 'Col3', 'Col4']

Output:

   Col1 Col2 Col3  Col4
0  Main1  A1  a1    0
1  Main1  A3  a3   22
2  Main1  A2  a2   15
3  Main1  A4  a4  130
4  Main2  B1  b1  150
5  Main2  B2  b2   30
6  Main2  B3  b3    1

This is one way using pd.DataFrame.from_dict :

d = {'Main1' : {'A1' : {'a1' : 0}, 
                'A2' : {'a2' : 15}, 
                'A3' : {'a3' : 22}, 
                'A4' : {'a4' : 130}},
     'Main2' : {'B1' : {'b1' : 150},
                'B2' : {'b2' : 30},
                'B3' : {'b3' : 1}}}

# restructure dictionary to dictionary of tuple keys -> values
d2 = {(i, j, k): d[i][j][k] for i in d.keys()
                            for j in d[i].keys()
                            for k in d[i][j].keys()}

# construct dataframe from dictionary
df = pd.DataFrame.from_dict(d2, orient='index').reset_index()

# split column of tuples to multiple columns
df[['col1', 'col2', 'col3']] = df['index'].apply(pd.Series)

# clean up: remove unwanted columns, rename and sort
df = df.drop('index', 1)\
       .rename(columns={0: 'col4'})\
       .sort_index(axis=1)

print(df)

    col1 col2 col3  col4
0  Main1   A1   a1     0
1  Main1   A2   a2    15
2  Main1   A3   a3    22
3  Main1   A4   a4   130
4  Main2   B1   b1   150
5  Main2   B2   b2    30
6  Main2   B3   b3     1

Another way I found of doing this was to make a dict of dataframes, concat them all together and then unstack and then drop the NaN values

dataframes = {k: pd.DataFrame(v) for k,v in d.items()}
dataframe = pd.concat(dataframes, axis=1)
output = dataframe.unstack().dropna()

Output:

Main1  A1  a1      0.0
       A2  a2     15.0
       A3  a3     22.0
       A4  a4    130.0
Main2  B1  b1    150.0
       B2  b2     30.0
       B3  b3      1.0
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM