在pandas multiindex的第二级中选择数据帧的子集

Question

This is the dataframe I have: 这是我的数据框：

 iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two', 'three', 'four']]  
 mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])   
 df = pd.DataFrame(np.random.randn(16, 3), index=mindex)

And it looks like this: 它看起来像这样：

                     0         1         2
first second                              
bar   one    -0.445212 -2.208192 -1.297759
      two     1.521942  0.592622 -1.677931
      three   0.709292  0.348715 -0.766430
      four   -1.812516 -0.982077 -1.155860
baz   one    -0.375230 -0.267912  2.621249
      two    -1.041991 -0.752277 -0.494512
      three  -1.029389 -0.331234  0.950335
      four   -1.357269  0.653581  1.289331
foo   one     0.980196  0.865067 -0.780575
      two    -1.641748  0.220253  2.141745
      three   0.272158 -0.320238  0.787176
      four   -0.265425 -0.767928  0.695651
qux   one    -0.117099  1.089503 -0.692016
      two    -0.203240 -0.314236  0.010321
      three   1.425749  0.268420 -0.886384
      four    0.181717 -0.268686  1.186988

I would like to select a sub-set of the dataframe for each element in the first index, such that the only the one and three index values from the second level of the multiindex are used. 我想为第一个索引中的每个元素选择数据帧的子集，以便仅使用来自多索引的第二级的one和three索引值。

I have checked this out in the advanced indexing section of the documentation, but without much success. 我已经在文档的高级索引部分中检查了这一点，但没有取得多大成功。 One can sub-select a specific index value from the second index level: 可以从第二个索引级别中选择一个特定的索引值：

df.loc['bar','one']
Out[74]: 
0   -0.445212
1   -2.208192
2   -1.297759
Name: (bar, one), dtype: float64

But not a tuple of values, because this: 但不是一个价值元组，因为这：

df.loc[('bar',('one','three'))]

results in an error: 导致错误：

KeyError: "None of [('one', 'three')] are in the [columns]" KeyError：“（[]中没有[（'one'，'three'）]都在[columns]中

I expected .loc to basically deliver bar and then rows that have second-level index values of one and three by this command. 我希望.loc基本上通过这个命令传递bar ，然后是第二级索引值为one和three的行。

How can I perform this kind of a sub-selection based on multi-index level sub-sets? 如何基于多索引级别子集执行此类子选择？

Answer 1

Add : for select all columns: 添加:用于选择所有列：

a = df.loc[('bar',('one','three')), :]
print (a)
                     0         1         2
first second                              
bar   one    -0.902444  2.115037 -0.065644
      three   2.095998  0.768128  0.413566

Similar solution with IndexSlice : 与IndexSlice类似的解决方案：

idx = pd.IndexSlice
a = df.loc[idx['bar', ('one','three')], :]
print (a)
                     0         1         2
first second                              
bar   one    -0.515183 -0.858751  0.854838
      three   2.315598  0.402738 -0.184113

As @Brad Solomon mentioned if want all values of first level: 正如@Brad所罗门所提到的，如果想要所有第一级的价值：

df1 = df.loc[(slice(None), ['one', 'three']), :]

idx = pd.IndexSlice
df1 = df.loc[idx[:, ('one','three')], :]

print (df1)
                     0         1         2
first second                              
bar   one    -0.266926  1.105319  1.768572
      three  -0.632492 -1.642508 -0.779770
baz   one    -0.380545 -1.632120  0.435597
      three   0.018085  2.114032  0.888008
foo   one     0.539179  0.164681  1.598194
      three   0.051494  0.872987 -1.882287
qux   one    -1.361244 -1.520816  2.678428
      three   0.323771 -1.691334 -1.826938

Answer 2

Just another approach: 另一种方法：

df.sort_index().loc(axis=0)[:, ['one', 'three']]
#                     0         1         2
#first second                              
#bar   one     0.358878  0.774507 -1.366380
#      three   0.869764 -0.626074 -0.481729
#baz   one    -0.348540 -0.167700 -1.753537
#      three  -1.830668 -0.140482  0.604910
#foo   one     1.396874 -0.428031  0.228650
#      three   0.673802 -0.016591 -0.655399
#qux   one     1.341654  0.662983  0.185743
#      three  -0.898745 -0.847318  0.766237

在pandas multiindex的第二级中选择数据帧的子集

问题描述

2 个解决方案

解决方案1
5 已采纳 2018-08-17 12:48:44

解决方案2
1 2018-08-17 13:05:48

在pandas multiindex的第二级中选择数据帧的子集

问题描述

2 个解决方案

解决方案1 5 已采纳 2018-08-17 12:48:44

解决方案2 1 2018-08-17 13:05:48

解决方案1
5 已采纳 2018-08-17 12:48:44

解决方案2
1 2018-08-17 13:05:48