简体   繁体   中英

Creating a Multi-Index / Hierarchical DataFrame from Dictionaries

Say I have the following dictionaries:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3],    'baz': []}

How can I create a multi-index DataFrame using these dictionaries?

It should be something like:

index_1  index_2     column_data_1
foo      A           2
         B           4
         C           5
bar      X           2
         Y           3
baz      np.NaN      np.NaN 

Note:

If NaN indices are not supported by Pandas, we can drop the empty entries in the dictionaries above.

Ideally, I would like the DataFrame to capture somehow the fact that those entries are missing if possible. However, the most important thing is being able to index the dataframe using the indices in multilevel_indices .

use concat :

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': []}

pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
          keys=multilevel_indices.keys())

Results in:

foo  A    2
     B    4
     C    5
bar  X    2
     Y    3
dtype: float64

Also, as @CT Zhu mentioned, in the definitions for baz , if you change [] to [None] you can keep track of those entries:

baz  NaN    None
foo  A         2
     B         4
     C         5
bar  X         2
     Y         3
dtype: object

The original dataset that you have may not result in nan index, but change it a little bit will do.

In [137]:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
                     labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
                             range(sum(map(len, multilevel_indices.values())))],
                     names=['index_1',  'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])


                 column_data_1
index_1 index_2               
baz     NaN                NaN
foo     A                    2
        B                    4
        C                    5
bar     X                    2
        Y                    3

[6 rows x 1 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM