简体   繁体   中英

convert to multiindex dataframe w/horizontal display and rename columns

let's say i have the following code:

df1 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])
df1['dataframe'] = 'df1'
df2 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])
df2['dataframe'] = 'df2'
df = pd.concat([df1, df2])
df.reset_index().set_index(['dataframe','index'])

this will return me a dataframe with 2 levels of indices, 'dataframe' and 'index'. i'm not sure what the correct term is but visually, the first index spans across rows as opposed to columns.

there are 2 operations i would like to perform on this dataframe that i am struggling with.

1) i would like to rename the columns in each "sub-dataframe" to something different, taken from a different list and apply them accordingly based on the first index previously assigned. i have tried the following but it does not work if i display "df" again:

new_cols = ['df1', 'df2']
for i,x in enumerate(new_cols):
    old_cols = df.loc[x].columns.tolist()
    df.loc[x].rename(columns={col_label: '{}_{}'.format(x,col_label) for col_label in old_cols}, inplace=True)

so, to be clear, instead of A,B,C,D i'd like df1_A...df1_D and df2_A...df2_D

2) i would like to re-orient this dataframe such that they span across the columns and so i would be scrolling across in order to view each "sub-dataframe" rather than up and down.

i've consulted the pandas API but still not able to get this right.

OK, from that starting point you firstly want to call reset_index again like so:

df.reset_index(drop=True)

Now you should have only one level in the index, and the columns A , B , C , D , and dataframe .

Now pivot by the dataframe column:

df_pivot = df.pivot(columns='dataframe')

You now have a dataframe with hierarchically-indexed columns, which will allow you to scroll across and see A , B , C , and D at the top level, with the df1 and df2 just underneath.

If you're just wanting to explore the data it's actually best to stop there. You'll be able to index in a natural way without renaming columns, and it'll be easy to explore the data by scrolling horizontally. To index into A and df1 values, you'd write:

df_pivot['A']['df1']

That's a natural syntax. But if you really did want to add in the underscores, you could add them like so:

df_pivot.columns = ['_'.join(col[::-1]).strip() for col in df_pivot.columns.values]

Because the pivot table's columns are represented by a MultiIndex, df_pivot.columns.values returns an array of tuples. Each tuple is something like ('A','df1') so if you want column names of the form df1_A you do need the [::1] I've added there, so that you join the tuple members in reverse order. If you're happy with the other order for column names ( A_df1 ), then you can remove the reverse step:

df_pivot.columns = ['_'.join(col).strip() for col in df_pivot.columns.values]

Using the dataframe generated by the original code snippet, we can create two separate dataframes based on which dataframe they belong to, then drop the "dataframe" column from both. We then assign the new column names to each of the columns properties of the two dataframes. At last, we pass in the list of df2 column names into df1 and since it doesn't exist, pandas creates new columns and we assign df2 's values to these new columns.

Edit: Got a line which makes a column multiindex from pandas cookbook

df1 = df[df["dataframe"] == "df1"].copy().drop("dataframe",axis = 1)
df2 = df[df["dataframe"] == "df2"].copy().drop("dataframe", axis = 1)
df1.columns = ["df1_" + df1.columns]
df2.columns = ["df2_" + df2.columns]
df1[df2.columns] = df2
df1.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df1.columns]); 
print df1

Output (New):

        df1                                     df2                      \
          A         B         C         D         A         B         C   
0 -0.228363  0.675313 -0.076193 -0.805547  0.920632  0.789152  0.275401   
1  0.145603  0.422236  0.623796  0.233534  2.338283 -1.033269 -0.334333   
2 -0.526696  0.307727  0.478437 -0.068488 -0.475583 -0.802997 -0.059091   
3 -1.676880 -0.272451 -0.777794  0.490290  1.456024  0.340962 -0.436860   
4  1.203065 -0.198686 -1.065447  1.188931 -1.140757  0.046975 -2.596953   
5 -0.603939  0.734130 -0.321634  0.150161  2.228873  0.748693 -0.300975   
6  1.028938  0.114437  0.268499  0.260428 -1.896507  0.136147  0.004577   
7 -1.329070 -0.901562 -1.401573  0.715426 -1.711233  0.420301  0.643113   
8  2.033646 -0.550192  1.532104 -1.196995 -0.004135 -1.334320  0.110115   
9 -0.818145 -1.240037  0.880706 -0.625155 -0.672653  0.365357 -0.864840   


          D  
0 -0.888626  
1 -0.952065  
2  0.249387  
3  0.952315  
4 -1.804463  
5 -0.428231  
6 -0.257906  
7  1.551899  
8  0.054855  
9  0.679394  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM