简体   繁体   English

如何使用 MultiIndex 索引和 MultiIndex 列对 Pandas DataFrame 进行切片?

[英]How to slice a Pandas DataFrame with a MultiIndex index and a MultiIndex column?

I am looking to create a new DataFrame that corresponds to the results of Devices A and B based on Silicon.我希望创建一个新的 DataFrame,它对应于基于硅的设备 A 和 B 的结果。

The following is my code for creating the DataFrame:以下是我创建 DataFrame 的代码:

import numpy as np
import pandas as pd

x = np.array(
    [
        [0.26, 0.92, 0.05, 0.43],
        [1.00, 0.62, 1.00, 1.00],
        [1.00, 0.97, 0.04, 1.00],
        [0.00, 1.00, 1.00, 0.88],
        [1.00, 1.00, 1.00, 0.79],
        [0.98, 1.00, 0.79, 0.99],
        [0.99, 1.00, 1.00, 1.00],
        [0.18, 1.00, 0.26, 1.00],
        [0.22, 0.00, 0.34, 0.82],
    ]
)
rowIndx = pd.MultiIndex.from_product(
    [["Slurm", "Zoidberg", "Wernstrom"], ["A", "B", "C"]],
    names=["Laboratory", "Device"],
)
colIndex = pd.MultiIndex.from_product(
    [["Replicant 1 ", "Replicant 2 "], ["Silicon", "Carbon"]]
)
robot = pd.DataFrame(data=x, index=rowIndx, columns=colIndex)
robot

Here is an image of the table.这是桌子的图像。 数据图片

This is the code that I thought would somewhat work, but it just gives me errors, so now I don't know what to try, robot[(robot.Device=="A") & (robot.Device=="B")][["Silicon"]]这是我认为有些工作的代码,但它只会给我错误,所以现在我不知道该尝试什么, robot[(robot.Device=="A") & (robot.Device=="B")][["Silicon"]]

I think you want something like this:我想你想要这样的东西:

In [6]: robot.loc[:, (robot.columns.get_level_values(level=1)=='Silicon')]
Out[6]:
                  Replicant 1  Replicant 2
                       Silicon      Silicon
Laboratory Device
Slurm      A              0.26         0.05
           B              1.00         1.00
           C              1.00         0.04
Zoidberg   A              0.00         1.00
           B              1.00         1.00
           C              0.98         0.79
Wernstrom  A              0.99         1.00
           B              0.18         0.26
           C              0.22         0.34

Two keys things here: The first key is using robot.loc[ _ , _ ] (specifying two arguments, one for the index and one for the column);这里有两个关键:第一个关键是使用robot.loc[ _ , _ ] (指定两个参数,一个用于索引,一个用于列); this has to be something your MultiIndex-type index and your MultiIndex-type columns can understand.这必须是您的 MultiIndex 类型索引和您的 MultiIndex 类型列可以理解的内容。

The second key is the robots.columns.get_level_values(level=1) , which gets the 4 column labels for level 1 (carbon/silicon) for the 4 columns displayed in the image of the DataFrame:第二个键是robots.columns.get_level_values(level=1) ,它获取 DataFrame 图像中显示的 4 列的级别 1(碳/硅)的 4 个列标签:

In [7]: robot.columns.get_level_values(level=1)
Out[7]: Index(['Silicon', 'Carbon', 'Silicon', 'Carbon'], dtype='object')

and it then filters which columns to show based on the given condition:然后它根据给定的条件过滤要显示的列:

In [8]: robot.columns.get_level_values(level=1)=='Silicon'
Out[8]: array([ True, False,  True, False])

If you had more elements besides Silicon, you could use the |如果你有除了硅以外的更多元素,你可以使用| operator (not the & operator) like this:运算符(不是&运算符)像这样:

robot.loc[:, (robot.columns.get_level_values(level=1)=='Silicon')|(robot.columns.get_level_values(level=1)=='Carbon')]

or a bit shorter:或者更短一点:

lv = robot.columns.get_level_values(level=1)
robot.loc[:, (lv=='Silicon')|(lv=='Carbon')]

UPDATE: If you also want to filter values in the index, you can use robot.index.get_level_values() instead of robot.columns.get_level_values() .更新:如果您还想过滤索引中的值,您可以使用robot.index.get_level_values()而不是robot.columns.get_level_values() Here's an example:下面是一个例子:

lv = robot.columns.get_level_values(level=1)
ilv = robot.index.get_level_values(level=1)
robot.loc[(ilv=='A')|(ilv=='B'), (lv=='Silicon')]

We've replaced the : (which means all values of all levels of the MultiIndex) with a logical mask to filter indices, the same way we did to filter columns.我们已经用逻辑掩码替换了:这意味着 MultiIndex 的所有级别的所有值)来过滤索引,就像我们过滤列的方式一样。

your dataframe is MultiIndex , So you need to use the following code to select a row:您的数据框是 MultiIndex ,因此您需要使用以下代码来选择一行:

result = robot.iloc[(robot.index.get_level_values('Device') == 'A')|(robot.index.get_level_values('Device') == 'B')]

Now, if you only want column Silicon use the following code:现在,如果您只想要列Silicon使用以下代码:

result.iloc[:, result.columns.get_level_values(1)== "Silicon"]

Use slicers like this:像这样使用切片器

robot.loc[(slice(None), ['A', 'B']), (slice(None), 'Silicon')]

                  Replicant 1  Replicant 2 
                       Silicon      Silicon
Laboratory Device                          
Slurm      A              0.26         0.05
           B              1.00         1.00
Zoidberg   A              0.00         1.00
           B              1.00         1.00
Wernstrom  A              0.99         1.00
           B              0.18         0.26

or:或者:

idx = pd.IndexSlice
robot.loc[idx[:, ['A', 'B']], idx[:, 'Silicon']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM