简体   繁体   English

从MultiIndex中选择特定级别的数据

[英]Select data at a particular level from a MultiIndex

I have the following Pandas Dataframe with a MultiIndex(Z,A): 我有以下带有MultiIndex(Z,A)的Pandas Dataframe:

             H1       H2  
   Z    A 
0  100  200  0.3112   -0.4197   
1  100  201  0.2967   0.4893    
2  100  202  0.3084   -0.4873   
3  100  203  0.3069   NaN        
4  101  203  -0.4956  NaN       

Question: How can I select all items with A=203? 问题:如何选择A = 203的所有项目? I tried df[:,'A'] but it doesn't work. 我试过df[:,'A']但它不起作用。 Then I found this in the online documentation so I tried: 然后我在在线文档中找到了这个 ,所以我尝试了:
df.xs(203,level='A')
but I get: 但我得到:
" TypeError: xs() got an unexpected keyword argument 'level' " TypeError: xs() got an unexpected keyword argument 'level'
Also I dont see this parameter in the installed doc( df.xs? ): 另外,我在安装的doc( df.xs?df.xs?不到这个参数:
"Parameters ---------- key : object Some label contained in the index, or partially in a MultiIndex axis : int, default 0 Axis to retrieve cross-section on copy : boolean, default True Whether to make a copy of the data" “参数---------- key:object索引中包含的某些标签,或者部分位于MultiIndex轴中:int,default 0用于检索复制的横截面的轴:boolean,default True是否为数据副本“
Note:I have the development version. 注意:我有开发版本。

Edit: I found this thread . 编辑:我找到了这个帖子 They recommend something like: 他们建议像:

df.select(lambda x: x[1]==200, axis=0)  

I still would like to know what happened with df.xs with the level parameter or what is the recommended way in the current version. 我仍然想知道df.xs使用level参数发生了什么,或者当前版本中推荐的方式是什么。

The problem lies in my assumption(incorrect) that I was in the dev version while in reality I had 1.6.1, one can check the current installed version with: 问题在于我的假设(不正确),我在开发版本,而实际上我有1.6.1,可以检查当前安装的版本:

import pandas
print pandas.__version__

in the current version df.xs() with the level parameter works ok. 在当前版本中,带有level参数的df.xs()工作正常。

Not a direct answer to the question, but if you want to select more than one value you can use the "slice()" notation: 不是问题的直接答案,但如果要选择多个值,可以使用“slice()”表示法:

import numpy
from pandas import  MultiIndex, Series

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
              ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = Series(numpy.random.randn(8), index=index)

In [10]: s
Out[10]:
first  second
bar    one       0.181621
       two       1.016225
baz    one       0.716589
       two      -0.353731
foo    one      -0.326301
       two       1.009143
qux    one       0.098225
       two      -1.087523
dtype: float64

In [11]: s.loc[slice(None)]
Out[11]:
first  second
bar    one       0.181621
       two       1.016225
baz    one       0.716589
       two      -0.353731
foo    one      -0.326301
       two       1.009143
qux    one       0.098225
       two      -1.087523
dtype: float64

In [12]: s.loc[slice(None), "one"]
Out[12]:
first
bar      0.181621
baz      0.716589
foo     -0.326301
qux      0.098225
dtype: float64

In [13]: s.loc["bar", slice(None)]
Out[13]:
first  second
bar    one       0.181621
       two       1.016225
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM