[英]Slicing a MultiIndex DataFrame by multiple values from a specified level
I want to slice a MultiIndex DataFrame by multiple values from a secondary level. 我想通过辅助级别的多个值对MultiIndex DataFrame进行切片。 For example, in the following DataFrame:
例如,在以下DataFrame中:
val1 val2
ind1 ind2 ind3
1 6 s1 10 8
2 7 s1 20 6
3 8 s2 30 4
4 9 s2 50 2
5 10 s3 60 0
I wish to slice only the rows in which ind3 == s1
or ind3 == s3
: 我希望只切片
ind3 == s1
或 ind3 == s3
:
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
Best hypothetical option would be to pass multiple arguments to .xs
, since it is possible to explicitly state the desired level
. 最好的假设选项是将多个参数传递给
.xs
,因为可以明确说明所需的level
。
I could obviously concat all the sliced-by-single-value DataFrames: 我显然可以连接所有切片的单值DataFrame:
In[2]: pd.concat([df.xs('s1',level=2), df.xs('s3',level=2)])
Out[2]:
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
But (a) it's tedious and not so readable when using more than 2 values, and (b) for large DataFrames it's quite heavy (or at least heavier than a multi-value slicing option, if that exists). 但是(a)当使用2个以上的值时,它很乏味且不那么可读; (b)对于大型DataFrame而言,它非常重(或者至少比多值切片选项重,如果存在的话)。
Here's the code to build the example DataFrame : 以下是构建示例DataFrame的代码 :
import pandas as pd
df = pd.DataFrame({'ind1':[1,2,3,4,5], 'ind2':[6,7,8,9,10], 'ind3':['s1','s1','s2','s2','s3'], 'val1':[10,20,30,50,60], 'val2':[8,6,4,2,0]}).set_index(['ind1','ind2','ind3'])
As with most selection from a DataFrame, you can use a mask or an indexer ( loc
in this case). 与DataFrame中的大多数选择一样,您可以使用掩码或索引器(在本例中为
loc
)。
To get the mask, you can use get_level_values
( docs ) on the MultiIndex followed by isin
( docs ). 为了让面膜,你可以使用
get_level_values
( 文档上的多指标),其次是isin
( 文档 )。
m = df.index.get_level_values('ind3').isin(['s1', 's3'])
df[m].reset_index(level=2, drop=True)
To use loc
: 要使用
loc
:
df.loc[(slice(None), slice(None), ['s1', 's3']), :].reset_index(level=2, drop=True)
both output 两个输出
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
Note: the loc
way can also be written as seen in Alberto Garcia-Raboso's answer. 注意:
loc
方式也可以按照Alberto Garcia-Raboso的回答编写。 Many people prefer that syntax as it is more consistent with loc
syntax for an Index
. 许多人更喜欢这种语法,因为它与
Index
loc
语法更加一致。 Both syntax styles are discussed in the docs . 这两种语法样式都在文档中讨论。
You can use an IndexSlice
: 您可以使用
IndexSlice
:
idx = pd.IndexSlice
result = df.loc[idx[:, :, ['s1', 's3']], idx[:]]
result.index = result.index.droplevel('ind3')
print(result)
Output: 输出:
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
The second line above can also be written as 上面的第二行也可以写成
result = df.loc(axis=0)[idx[:, :, ['s1', 's3']]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.