简体   繁体   English

如何使用MultiIndex的相关级别对MultiIndex DataFrame进行切片

[英]how to slice MultiIndex DataFrame with dependent levels of MultiIndex

I have a pandas dataframe with 4 levels of a MultiIndex. 我有一个4级MultiIndex的pandas数据帧。 I am trying to select rows which has different level 4 indexes for each level 1 indexses. 我正在尝试为每个级别1索引选择具有不同级别4索引的行。

example: 例:

In [68]: df = pd.DataFrame({'i1':[1,1,1,2,2,2],
                        'i2':[1,1,2,1,1,2],
                        'i3':[1,1,1,1,1,1],
                        'i4':[0,1,2,0,1,2],
                        'data':[1,1,2,2,1,1]}).set_index(['i1','i2','i3','i4'])


In [69]: df
Out[69]:
             data
i1 i2 i3 i4
1  1  1  0      1
         1      1
   2  1  2      2
2  1  1  0      2
         1      1
   2  1  2      1

Now I want to get indexses as follows: 现在我想获得索引如下:

index i4 in [0, 1] for index i1 = 1 对于索引i1 = 1,索引i4在[0,1]中

index i4 in [1, 2] for index i1 = 2 对于索引i1 = 2,[1,2]中的索引i4

                 data
i1 i2 i3 i4
1  1  1  0      1
         1      1
2  1  1  1      1
   2  1  2      1

For now this works: 现在这个工作:

    cond1 = (df.index.get_level_values('i1') == 1) & (df.index.get_level_values('i4').isin([0,1]))
    cond2 = (df.index.get_level_values('i1') == 2) & (df.index.get_level_values('i4').isin([1,2]))
    .
    .
    .
    condN = ...
    df[cond1 | cond2 | ... | condN]

but it looks like bad solution. 但它看起来不好解决方案。 Is there any clever way of doing this? 这有什么聪明的方法吗?

You can make this a bit easier with IndexSlice , as follows: 使用IndexSlice可以使这更容易,如下所示:

idx = pd.IndexSlice
index1 = idx[1, :, :, 0:1]
index2 = idx[2, :, :, 1:2]
pd.concat([df.loc[index1], df.loc[index2]])

If you have many indices you need to create, you can store those indices in a dataframe and iterate over that dataframe to create your various slices and then use a list comprehension in pd.concat to get your final object. 如果您需要创建许多索引,则可以将这些索引存储在数据框中并迭代该数据框以创建各种切片,然后使用pd.concat的列表pd.concat来获取最终对象。 Below, x['id1'] is assumed to be the value you want id1 to have, and I also make the assumption that you want to limit the same two index columns. 下面,假设x ['id1']是你想要id1拥有的值,我还假设你要限制相同的两个索引列。

indices = [
    idx[
        x['id1'],
        lambda x['id2']: x['id2'] or slice(None),
        lambda x['id3']: x['id3'] or slice(None),
        x['id4']
    ] for x in index_df.iterrows()
]
pd.concat([df.loc[i] for i in indices])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM