简体   繁体   中英

how to select multiple values from a single level of a dataframe multiindex

If I have the following:

df = pd.DataFrame(np.random.random((4,8)))
tupleList = zip([x for x in 'abcdefgh'], [y for y in ['iijjkkll'])
ind = pd.MultiIndex.from_tuples(tupleList)
df.columns = ind

In [71]: df
Out[71]: 
          a         b         c         d         e         f         g  \
          i         i         j         j         k         k         l   
0  0.968112  0.809183  0.144320  0.518120  0.820079  0.648237  0.971552   
1  0.959022  0.721705  0.139588  0.408940  0.230956  0.907192  0.467016   
2  0.335085  0.537437  0.725119  0.486447  0.114048  0.150150  0.894322   
3  0.051249  0.186547  0.779814  0.905914  0.024298  0.002489  0.339714   

          h  
          l  
0  0.438330  
1  0.225447  
2  0.331413  
3  0.530789  

[4 rows x 8 columns]

what is the easiest way to select the columns that have a second level label of "j" or "k"?

          c         d         e         f
          j         j         k         k
0  0.948030  0.243993  0.627497  0.729024
1  0.087703  0.874968  0.581875  0.996466
2  0.802155  0.213450  0.375096  0.184569
3  0.164278  0.646088  0.201323  0.022498

I can do this:

df.loc[:, df.columns.get_level_values(1).isin(['j', 'k'])]

But that seems pretty verbose for something that feels like it should be simple. Any better approaches?

See here for multiindex using slicers, introduced in 0.14.0

In [36]: idx = pd.IndexSlice

In [37]: df.loc[:, idx[:, ['j', 'k']]]
Out[37]: 
          c         d         e         f
          j         j         k         k
0  0.750582  0.877763  0.262696  0.226005
1  0.025902  0.967179  0.125647  0.297304
2  0.463544  0.104973  0.154113  0.284820
3  0.631695  0.841023  0.820907  0.938378

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM