简体   繁体   English

如何访问熊猫数据框中的多级索引?

[英]How to access multi-level index in pandas data frame?

I would like to call those row with same index.我想用相同的索引调用那些行。

so this is the example data frame,所以这是示例数据框,

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

df = pd.DataFrame(np.random.randn(8, 4), index=arrays)

In [16]: df
Out[16]: 
                0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
foo one  1.075770 -0.109050  1.643563 -1.469388
    two  0.357021 -0.674600 -1.776904 -0.968914
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061

I would like to select我想选择

                0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
baz one  0.404705  0.577046 -1.715002 -1.039268
foo one  1.075770 -0.109050  1.643563 -1.469388
qux one -1.294524  0.413738  0.276662 -0.472035

or even as this format甚至作为这种格式

            0         1         2         3
one -0.424972  0.567020  0.276232 -1.087401
one  0.404705  0.577046 -1.715002 -1.039268
one  1.075770 -0.109050  1.643563 -1.469388
one -1.294524  0.413738  0.276662 -0.472035

I have tried df['bar','one] and it's not working.我试过df['bar','one]但它不起作用。 I am now sure how should I access the multi-level index.我现在确定我应该如何访问多级索引。

You can use MultiIndex slicing (use slice(None) instead of colon):您可以使用 MultiIndex 切片(使用slice(None)而不是冒号):

df = df.loc[(slice(None), 'one'), :]

Result:结果:

                0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
baz one  0.404705  0.577046 -1.715002 -1.039268
foo one  1.075770 -0.109050  1.643563 -1.469388
qux one -1.294524  0.413738  0.276662 -0.472035

Finally you can drop the first index column:最后,您可以删除第一个索引列:

df.index = df.index.droplevel(0)

Result:结果:

            0         1         2         3
one -0.424972  0.567020  0.276232 -1.087401
one  0.404705  0.577046 -1.715002 -1.039268
one  1.075770 -0.109050  1.643563 -1.469388
one -1.294524  0.413738  0.276662 -0.472035

Use DataFrame.xs and if need both levels add drop_level=False :使用DataFrame.xs并且如果需要两个级别都添加drop_level=False

df1 = df.xs('one', level=1, drop_level=False)
print (df1)
bar one -0.424972  0.567020  0.276232 -1.087401
baz one  0.404705  0.577046 -1.715002 -1.039268
foo one  1.075770 -0.109050  1.643563 -1.469388
qux one -1.294524  0.413738  0.276662 -0.472035

For second remove first level by DataFrame.reset_index with drop=True , so possible select by label with DataFrame.loc :对于第二个通过DataFrame.reset_indexdrop=True删除第一级,因此可以通过带有DataFrame.loc的标签进行DataFrame.loc

df2 = df.reset_index(level=0, drop=True).loc['one']
#alternative
#df2 = df.xs('one', level=1, drop_level=False).reset_index(level=0, drop=True)
print (df2)
            0         1         2         3
one -0.424972  0.567020  0.276232 -1.087401
one  0.404705  0.577046 -1.715002 -1.039268
one  1.075770 -0.109050  1.643563 -1.469388
one -1.294524  0.413738  0.276662 -0.472035

More common is used xs without duplicated levels - so after select one is removed this level:更常见的是使用没有重复级别的xs - 所以在选择one之后删除这个级别:

df3 = df.xs('one', level=1)
print (df3)
            0         1         2         3
bar -0.424972  0.567020  0.276232 -1.087401
baz  0.404705  0.577046 -1.715002 -1.039268
foo  1.075770 -0.109050  1.643563 -1.469388
qux -1.294524  0.413738  0.276662 -0.472035

Since the question involves multi-indexing and the sequence of the index is 'bar' and then 'one' which can be verified by using df.index command:由于问题涉及多索引并且索引的顺序是'bar'然后是'one',可以使用df.index命令进行验证:

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           )

The output that you are looking for can be accessed using df.loc[('bar','one')]可以使用df.loc[('bar','one')]访问您要查找的输出

The output it produces is它产生的输出是

0    0.162693
1    0.420518
2   -0.152041
3   -1.039439
Name: (bar, one), dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM