在pandas multiindex的第二級中選擇數據幀的子集

Question

這是我的數據框：

 iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two', 'three', 'four']]  
 mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])   
 df = pd.DataFrame(np.random.randn(16, 3), index=mindex)

它看起來像這樣：

                     0         1         2
first second                              
bar   one    -0.445212 -2.208192 -1.297759
      two     1.521942  0.592622 -1.677931
      three   0.709292  0.348715 -0.766430
      four   -1.812516 -0.982077 -1.155860
baz   one    -0.375230 -0.267912  2.621249
      two    -1.041991 -0.752277 -0.494512
      three  -1.029389 -0.331234  0.950335
      four   -1.357269  0.653581  1.289331
foo   one     0.980196  0.865067 -0.780575
      two    -1.641748  0.220253  2.141745
      three   0.272158 -0.320238  0.787176
      four   -0.265425 -0.767928  0.695651
qux   one    -0.117099  1.089503 -0.692016
      two    -0.203240 -0.314236  0.010321
      three   1.425749  0.268420 -0.886384
      four    0.181717 -0.268686  1.186988

我想為第一個索引中的每個元素選擇數據幀的子集，以便僅使用來自多索引的第二級的one和three索引值。

我已經在文檔的高級索引部分中檢查了這一點，但沒有取得多大成功。 可以從第二個索引級別中選擇一個特定的索引值：

df.loc['bar','one']
Out[74]: 
0   -0.445212
1   -2.208192
2   -1.297759
Name: (bar, one), dtype: float64

但不是一個價值元組，因為這：

df.loc[('bar',('one','three'))]

導致錯誤：

KeyError：“（[]中沒有[（'one'，'three'）]都在[columns]中

我希望.loc基本上通過這個命令傳遞bar ，然后是第二級索引值為one和three的行。

如何基於多索引級別子集執行此類子選擇？

Answer 1

添加:用於選擇所有列：

a = df.loc[('bar',('one','three')), :]
print (a)
                     0         1         2
first second                              
bar   one    -0.902444  2.115037 -0.065644
      three   2.095998  0.768128  0.413566

與IndexSlice類似的解決方案：

idx = pd.IndexSlice
a = df.loc[idx['bar', ('one','three')], :]
print (a)
                     0         1         2
first second                              
bar   one    -0.515183 -0.858751  0.854838
      three   2.315598  0.402738 -0.184113

正如@Brad所羅門所提到的，如果想要所有第一級的價值：

df1 = df.loc[(slice(None), ['one', 'three']), :]

idx = pd.IndexSlice
df1 = df.loc[idx[:, ('one','three')], :]

print (df1)
                     0         1         2
first second                              
bar   one    -0.266926  1.105319  1.768572
      three  -0.632492 -1.642508 -0.779770
baz   one    -0.380545 -1.632120  0.435597
      three   0.018085  2.114032  0.888008
foo   one     0.539179  0.164681  1.598194
      three   0.051494  0.872987 -1.882287
qux   one    -1.361244 -1.520816  2.678428
      three   0.323771 -1.691334 -1.826938

Answer 2

另一種方法：

df.sort_index().loc(axis=0)[:, ['one', 'three']]
#                     0         1         2
#first second                              
#bar   one     0.358878  0.774507 -1.366380
#      three   0.869764 -0.626074 -0.481729
#baz   one    -0.348540 -0.167700 -1.753537
#      three  -1.830668 -0.140482  0.604910
#foo   one     1.396874 -0.428031  0.228650
#      three   0.673802 -0.016591 -0.655399
#qux   one     1.341654  0.662983  0.185743
#      three  -0.898745 -0.847318  0.766237

在pandas multiindex的第二級中選擇數據幀的子集

問題描述

2 個解決方案

解決方案1
5 已采納 2018-08-17 12:48:44

解決方案2
1 2018-08-17 13:05:48

在pandas multiindex的第二級中選擇數據幀的子集

問題描述

2 個解決方案

解決方案1 5 已采納 2018-08-17 12:48:44

解決方案2 1 2018-08-17 13:05:48

解決方案1
5 已采納 2018-08-17 12:48:44

解決方案2
1 2018-08-17 13:05:48