通过指定级别的多个值切片MultiIndex DataFrame

Question

I want to slice a MultiIndex DataFrame by multiple values from a secondary level. 我想通过辅助级别的多个值对MultiIndex DataFrame进行切片。 For example, in the following DataFrame: 例如，在以下DataFrame中：

                val1  val2
ind1 ind2 ind3            
1    6    s1      10     8
2    7    s1      20     6
3    8    s2      30     4
4    9    s2      50     2
5    10   s3      60     0

I wish to slice only the rows in which ind3 == s1 or ind3 == s3 : 我希望只切片ind3 == s1 或 ind3 == s3 ：

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Best hypothetical option would be to pass multiple arguments to .xs , since it is possible to explicitly state the desired level . 最好的假设选项是将多个参数传递给.xs ，因为可以明确说明所需的level 。

I could obviously concat all the sliced-by-single-value DataFrames: 我显然可以连接所有切片的单值DataFrame：

In[2]: pd.concat([df.xs('s1',level=2), df.xs('s3',level=2)])
Out[2]:
           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

But (a) it's tedious and not so readable when using more than 2 values, and (b) for large DataFrames it's quite heavy (or at least heavier than a multi-value slicing option, if that exists). 但是（a）当使用2个以上的值时，它很乏味且不那么可读; （b）对于大型DataFrame而言，它非常重（或者至少比多值切片选项重，如果存在的话）。

Here's the code to build the example DataFrame : 以下是构建示例DataFrame的代码 ：

import pandas as pd
df = pd.DataFrame({'ind1':[1,2,3,4,5], 'ind2':[6,7,8,9,10], 'ind3':['s1','s1','s2','s2','s3'], 'val1':[10,20,30,50,60], 'val2':[8,6,4,2,0]}).set_index(['ind1','ind2','ind3'])

Answer 1

As with most selection from a DataFrame, you can use a mask or an indexer ( loc in this case). 与DataFrame中的大多数选择一样，您可以使用掩码或索引器（在本例中为loc ）。

To get the mask, you can use get_level_values ( docs ) on the MultiIndex followed by isin ( docs ). 为了让面膜，你可以使用get_level_values （文档上的多指标），其次是isin （文档）。

m = df.index.get_level_values('ind3').isin(['s1', 's3'])
df[m].reset_index(level=2, drop=True)

To use loc : 要使用loc ：

df.loc[(slice(None), slice(None), ['s1', 's3']), :].reset_index(level=2, drop=True)

both output 两个输出

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Note: the loc way can also be written as seen in Alberto Garcia-Raboso's answer. 注意： loc方式也可以按照Alberto Garcia-Raboso的回答编写。 Many people prefer that syntax as it is more consistent with loc syntax for an Index . 许多人更喜欢这种语法，因为它与Index loc语法更加一致。 Both syntax styles are discussed in the docs . 这两种语法样式都在文档中讨论。

Answer 2

You can use an IndexSlice : 您可以使用IndexSlice ：

idx = pd.IndexSlice
result = df.loc[idx[:, :, ['s1', 's3']], idx[:]]
result.index = result.index.droplevel('ind3')
print(result)

Output: 输出：

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

The second line above can also be written as 上面的第二行也可以写成

result = df.loc(axis=0)[idx[:, :, ['s1', 's3']]]

通过指定级别的多个值切片MultiIndex DataFrame

问题描述

2 个解决方案

解决方案1
11 已采纳 2016-08-04 17:53:13

解决方案2
7 2016-08-04 17:54:49

通过指定级别的多个值切片MultiIndex DataFrame

问题描述

2 个解决方案

解决方案1 11 已采纳 2016-08-04 17:53:13

解决方案2 7 2016-08-04 17:54:49

解决方案1
11 已采纳 2016-08-04 17:53:13

解决方案2
7 2016-08-04 17:54:49