[英]Selecting the 2nd MultiIndex Level of Pandas DataFrame as an Indexer
[英]Selecting a sub-set of the dataframe in the second level of the pandas multiindex
这是我的数据框:
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two', 'three', 'four']]
mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(16, 3), index=mindex)
它看起来像这样:
0 1 2
first second
bar one -0.445212 -2.208192 -1.297759
two 1.521942 0.592622 -1.677931
three 0.709292 0.348715 -0.766430
four -1.812516 -0.982077 -1.155860
baz one -0.375230 -0.267912 2.621249
two -1.041991 -0.752277 -0.494512
three -1.029389 -0.331234 0.950335
four -1.357269 0.653581 1.289331
foo one 0.980196 0.865067 -0.780575
two -1.641748 0.220253 2.141745
three 0.272158 -0.320238 0.787176
four -0.265425 -0.767928 0.695651
qux one -0.117099 1.089503 -0.692016
two -0.203240 -0.314236 0.010321
three 1.425749 0.268420 -0.886384
four 0.181717 -0.268686 1.186988
我想为第一个索引中的每个元素选择数据帧的子集,以便仅使用来自多索引的第二级的one
和three
索引值。
我已经在文档的高级索引部分中检查了这一点,但没有取得多大成功。 可以从第二个索引级别中选择一个特定的索引值:
df.loc['bar','one']
Out[74]:
0 -0.445212
1 -2.208192
2 -1.297759
Name: (bar, one), dtype: float64
但不是一个价值元组,因为这:
df.loc[('bar',('one','three'))]
导致错误:
KeyError:“([]中没有[('one','three')]都在[columns]中
我希望.loc
基本上通过这个命令传递bar
,然后是第二级索引值为one
和three
的行。
如何基于多索引级别子集执行此类子选择?
添加:
用于选择所有列:
a = df.loc[('bar',('one','three')), :]
print (a)
0 1 2
first second
bar one -0.902444 2.115037 -0.065644
three 2.095998 0.768128 0.413566
与IndexSlice
类似的解决方案:
idx = pd.IndexSlice
a = df.loc[idx['bar', ('one','three')], :]
print (a)
0 1 2
first second
bar one -0.515183 -0.858751 0.854838
three 2.315598 0.402738 -0.184113
正如@Brad所罗门所提到的,如果想要所有第一级的价值:
df1 = df.loc[(slice(None), ['one', 'three']), :]
idx = pd.IndexSlice
df1 = df.loc[idx[:, ('one','three')], :]
print (df1)
0 1 2
first second
bar one -0.266926 1.105319 1.768572
three -0.632492 -1.642508 -0.779770
baz one -0.380545 -1.632120 0.435597
three 0.018085 2.114032 0.888008
foo one 0.539179 0.164681 1.598194
three 0.051494 0.872987 -1.882287
qux one -1.361244 -1.520816 2.678428
three 0.323771 -1.691334 -1.826938
另一种方法:
df.sort_index().loc(axis=0)[:, ['one', 'three']]
# 0 1 2
#first second
#bar one 0.358878 0.774507 -1.366380
# three 0.869764 -0.626074 -0.481729
#baz one -0.348540 -0.167700 -1.753537
# three -1.830668 -0.140482 0.604910
#foo one 1.396874 -0.428031 0.228650
# three 0.673802 -0.016591 -0.655399
#qux one 1.341654 0.662983 0.185743
# three -0.898745 -0.847318 0.766237
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.