[英]Selecting a sub-set of the dataframe in the second level of the pandas multiindex
This is the dataframe I have: 这是我的数据框:
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two', 'three', 'four']]
mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(16, 3), index=mindex)
And it looks like this: 它看起来像这样:
0 1 2
first second
bar one -0.445212 -2.208192 -1.297759
two 1.521942 0.592622 -1.677931
three 0.709292 0.348715 -0.766430
four -1.812516 -0.982077 -1.155860
baz one -0.375230 -0.267912 2.621249
two -1.041991 -0.752277 -0.494512
three -1.029389 -0.331234 0.950335
four -1.357269 0.653581 1.289331
foo one 0.980196 0.865067 -0.780575
two -1.641748 0.220253 2.141745
three 0.272158 -0.320238 0.787176
four -0.265425 -0.767928 0.695651
qux one -0.117099 1.089503 -0.692016
two -0.203240 -0.314236 0.010321
three 1.425749 0.268420 -0.886384
four 0.181717 -0.268686 1.186988
I would like to select a sub-set of the dataframe for each element in the first index, such that the only the one
and three
index values from the second level of the multiindex are used. 我想为第一个索引中的每个元素选择数据帧的子集,以便仅使用来自多索引的第二级的
one
和three
索引值。
I have checked this out in the advanced indexing section of the documentation, but without much success. 我已经在文档的高级索引部分中检查了这一点,但没有取得多大成功。 One can sub-select a specific index value from the second index level:
可以从第二个索引级别中选择一个特定的索引值:
df.loc['bar','one']
Out[74]:
0 -0.445212
1 -2.208192
2 -1.297759
Name: (bar, one), dtype: float64
But not a tuple of values, because this: 但不是一个价值元组,因为这:
df.loc[('bar',('one','three'))]
results in an error: 导致错误:
KeyError: "None of [('one', 'three')] are in the [columns]"
KeyError:“([]中没有[('one','three')]都在[columns]中
I expected .loc
to basically deliver bar
and then rows that have second-level index values of one
and three
by this command. 我希望
.loc
基本上通过这个命令传递bar
,然后是第二级索引值为one
和three
的行。
How can I perform this kind of a sub-selection based on multi-index level sub-sets? 如何基于多索引级别子集执行此类子选择?
Add :
for select all columns: 添加
:
用于选择所有列:
a = df.loc[('bar',('one','three')), :]
print (a)
0 1 2
first second
bar one -0.902444 2.115037 -0.065644
three 2.095998 0.768128 0.413566
Similar solution with IndexSlice
: 与
IndexSlice
类似的解决方案:
idx = pd.IndexSlice
a = df.loc[idx['bar', ('one','three')], :]
print (a)
0 1 2
first second
bar one -0.515183 -0.858751 0.854838
three 2.315598 0.402738 -0.184113
As @Brad Solomon mentioned if want all values of first level: 正如@Brad所罗门所提到的,如果想要所有第一级的价值:
df1 = df.loc[(slice(None), ['one', 'three']), :]
idx = pd.IndexSlice
df1 = df.loc[idx[:, ('one','three')], :]
print (df1)
0 1 2
first second
bar one -0.266926 1.105319 1.768572
three -0.632492 -1.642508 -0.779770
baz one -0.380545 -1.632120 0.435597
three 0.018085 2.114032 0.888008
foo one 0.539179 0.164681 1.598194
three 0.051494 0.872987 -1.882287
qux one -1.361244 -1.520816 2.678428
three 0.323771 -1.691334 -1.826938
Just another approach: 另一种方法:
df.sort_index().loc(axis=0)[:, ['one', 'three']]
# 0 1 2
#first second
#bar one 0.358878 0.774507 -1.366380
# three 0.869764 -0.626074 -0.481729
#baz one -0.348540 -0.167700 -1.753537
# three -1.830668 -0.140482 0.604910
#foo one 1.396874 -0.428031 0.228650
# three 0.673802 -0.016591 -0.655399
#qux one 1.341654 0.662983 0.185743
# three -0.898745 -0.847318 0.766237
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.