[英]How to access multi-level index in pandas data frame?
I would like to call those row with same index.我想用相同的索引调用那些行。
so this is the example data frame,所以这是示例数据框,
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
In [16]: df
Out[16]:
0 1 2 3
bar one -0.424972 0.567020 0.276232 -1.087401
two -0.673690 0.113648 -1.478427 0.524988
baz one 0.404705 0.577046 -1.715002 -1.039268
two -0.370647 -1.157892 -1.344312 0.844885
foo one 1.075770 -0.109050 1.643563 -1.469388
two 0.357021 -0.674600 -1.776904 -0.968914
qux one -1.294524 0.413738 0.276662 -0.472035
two -0.013960 -0.362543 -0.006154 -0.923061
I would like to select我想选择
0 1 2 3
bar one -0.424972 0.567020 0.276232 -1.087401
baz one 0.404705 0.577046 -1.715002 -1.039268
foo one 1.075770 -0.109050 1.643563 -1.469388
qux one -1.294524 0.413738 0.276662 -0.472035
or even as this format甚至作为这种格式
0 1 2 3
one -0.424972 0.567020 0.276232 -1.087401
one 0.404705 0.577046 -1.715002 -1.039268
one 1.075770 -0.109050 1.643563 -1.469388
one -1.294524 0.413738 0.276662 -0.472035
I have tried df['bar','one]
and it's not working.我试过
df['bar','one]
但它不起作用。 I am now sure how should I access the multi-level index.我现在确定我应该如何访问多级索引。
You can use MultiIndex slicing (use slice(None)
instead of colon):您可以使用 MultiIndex 切片(使用
slice(None)
而不是冒号):
df = df.loc[(slice(None), 'one'), :]
Result:结果:
0 1 2 3
bar one -0.424972 0.567020 0.276232 -1.087401
baz one 0.404705 0.577046 -1.715002 -1.039268
foo one 1.075770 -0.109050 1.643563 -1.469388
qux one -1.294524 0.413738 0.276662 -0.472035
Finally you can drop the first index column:最后,您可以删除第一个索引列:
df.index = df.index.droplevel(0)
Result:结果:
0 1 2 3
one -0.424972 0.567020 0.276232 -1.087401
one 0.404705 0.577046 -1.715002 -1.039268
one 1.075770 -0.109050 1.643563 -1.469388
one -1.294524 0.413738 0.276662 -0.472035
Use DataFrame.xs
and if need both levels add drop_level=False
:使用
DataFrame.xs
并且如果需要两个级别都添加drop_level=False
:
df1 = df.xs('one', level=1, drop_level=False)
print (df1)
bar one -0.424972 0.567020 0.276232 -1.087401
baz one 0.404705 0.577046 -1.715002 -1.039268
foo one 1.075770 -0.109050 1.643563 -1.469388
qux one -1.294524 0.413738 0.276662 -0.472035
For second remove first level by DataFrame.reset_index
with drop=True
, so possible select by label with DataFrame.loc
:对于第二个通过
DataFrame.reset_index
和drop=True
删除第一级,因此可以通过带有DataFrame.loc
的标签进行DataFrame.loc
:
df2 = df.reset_index(level=0, drop=True).loc['one']
#alternative
#df2 = df.xs('one', level=1, drop_level=False).reset_index(level=0, drop=True)
print (df2)
0 1 2 3
one -0.424972 0.567020 0.276232 -1.087401
one 0.404705 0.577046 -1.715002 -1.039268
one 1.075770 -0.109050 1.643563 -1.469388
one -1.294524 0.413738 0.276662 -0.472035
More common is used xs
without duplicated levels - so after select one
is removed this level:更常见的是使用没有重复级别的
xs
- 所以在选择one
之后删除这个级别:
df3 = df.xs('one', level=1)
print (df3)
0 1 2 3
bar -0.424972 0.567020 0.276232 -1.087401
baz 0.404705 0.577046 -1.715002 -1.039268
foo 1.075770 -0.109050 1.643563 -1.469388
qux -1.294524 0.413738 0.276662 -0.472035
Since the question involves multi-indexing and the sequence of the index is 'bar' and then 'one' which can be verified by using df.index command:由于问题涉及多索引并且索引的顺序是'bar'然后是'one',可以使用df.index命令进行验证:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')],
)
The output that you are looking for can be accessed using df.loc[('bar','one')]
可以使用
df.loc[('bar','one')]
访问您要查找的输出
The output it produces is它产生的输出是
0 0.162693
1 0.420518
2 -0.152041
3 -1.039439
Name: (bar, one), dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.