简体   繁体   中英

Selecting sublevels of Multiindex columns in pandas

I generate a multiindex dataframe like this example

import pandas as pd
import numpy as np

iterables = [ ['co1', 'co2', 'co3', 'co4'], ['age','weight'] ]
multi = pd.MultiIndex.from_product(iterables, names= ["Spread", "attribute"])

df = pd.DataFrame(np.random.rand(80).reshape(10,8),index = range(0,10), columns = multi)

The columns each have a sublevel attribute called 'weight'

I need to generate a list or (preferably) Series that contains, for a given row, all the 'weight' sub-columns in that row. In the example picture, I'd want a Series that gave me 0.02, 0.46, 0.33, 0.47.

Can anyone suggest a nice way to do this? The solutions I've thought of are all gross, and I suspect I have an incomplete understanding of the indexing capabilities of pandas.

在此输入图像描述

IIUC then you can use loc and pass a tuple consisting of a slice and column label to access the col of interest at that level:

In [59]:
iterables = [ ['co1', 'co2', 'co3', 'co4'], ['age','weight'] ]
multi = pd.MultiIndex.from_product(iterables, names= ["Spread", "attribute"])
df = pd.DataFrame(np.random.rand(80).reshape(10,8),index = range(0,10), columns = multi)
df

Out[59]:
Spread          co1                 co2                 co3            \
attribute       age    weight       age    weight       age    weight   
0          0.600947  0.509537  0.605538  0.496002  0.215206  0.075079   
1          0.152956  0.922832  0.167788  0.024761  0.622378  0.983030   
2          0.712478  0.603798  0.407014  0.625474  0.445592  0.903240   
3          0.420569  0.576604  0.220097  0.401624  0.929464  0.512026   
4          0.273088  0.032303  0.607577  0.836231  0.751845  0.181522   
5          0.859699  0.274760  0.456812  0.666109  0.349961  0.237894   
6          0.632754  0.603252  0.157416  0.221576  0.068355  0.121864   
7          0.090595  0.035526  0.698262  0.525770  0.792618  0.220601   
8          0.670236  0.805195  0.310680  0.100464  0.875299  0.853238   
9          0.020501  0.405245  0.447614  0.999340  0.659616  0.709312   

Spread          co4            
attribute       age    weight  
0          0.297421  0.415730  
1          0.235259  0.156014  
2          0.365762  0.198299  
3          0.695431  0.478457  
4          0.331657  0.338436  
5          0.943810  0.097999  
6          0.638720  0.033747  
7          0.646969  0.475316  
8          0.623225  0.024976  
9          0.023494  0.959514  

In [61]:
df.loc[1,(slice(None),'weight')]

Out[61]:
Spread  attribute
co1     weight       0.922832
co2     weight       0.024761
co3     weight       0.983030
co4     weight       0.156014
Name: 1, dtype: float64

To explain the syntax :

df.loc[1,(slice(None),'weight')]

So the first param is just your index lave, the second param is a tuple consisting of a slice and a col label, the first member being slice(None) selects all cols 'col1' to 'col4' in effect, then the second param selects at the next level cols that match the label 'weight'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM