在 pandas 中使用多索引数据框进行索引

Question

Consider the following example data:考虑以下示例数据：

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          740     389
           1          786     815
           2          178     265
           3          841     484
           4          211     534

So, How can I query the dataframe only with the second level index Patient only?那么，如何仅使用二级索引Patient查询数据框？ For example, I'd like to know all the data with respect to Patient 2 .例如，我想知道关于Patient 2的所有数据。

I've tried data[data.index.get_level_values(1)==2] , and it worked fine.我试过data[data.index.get_level_values(1)==2] ，效果很好。 But is there any way to achieve the same with one these ( loc , iloc or ix ) indexing methods?但是有什么方法可以通过这些（ loc 、 iloc或ix ）索引方法来实现相同的效果吗？

Answer 1

I think the simpliest is use xs : 我认为最简单的是使用xs ：

np.random.seed(100)
names = ['Taxon','Patient']
mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
print (df)
                    Tissue  Stool
Taxon      Patient               
Firmicutes 1             8      8
           2             3      7
           3             7      0
           4             4      2
           5             5      2
another    1             2      2
           2             1      0
           3             8      4
           4             0      9
           5             6      2

print (df.xs(2, level=1))
            Tissue  Stool
Taxon                    
Firmicutes       3      7
another          1      0

#if need also level Patient
print (df.xs(2, level=1, drop_level=False))
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

Solution with loc - is possible specify axis : loc解决方案-可以指定axis ：

print (df.loc(axis=0)[:,2])
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

Answer 2

Yes, use pd.IndexSlice which is exactly what you are looking for. 是的，使用正是您要查找的pd.IndexSlice 。 See the documentation here . 请参阅此处的文档。

Some dummy data: 一些伪数据：

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          158     137
           1          697     980
           2          751     759
           3          171     556
           4          701     620

You can write it explicitly like: 您可以像这样明确地编写它：

df.loc[(slice(None), 2), :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

Or you may use the more readable pd.IndexSlice: 或者，您可以使用更具可读性的pd.IndexSlice：

idx = pd.IndexSlice
df.loc[idx[:, 2], :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

在 pandas 中使用多索引数据框进行索引

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-03-26 10:05:27

解决方案2
0 2017-03-26 09:05:53

在 pandas 中使用多索引数据框进行索引

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-03-26 10:05:27

解决方案2 0 2017-03-26 09:05:53

解决方案1
2 已采纳 2017-03-26 10:05:27

解决方案2
0 2017-03-26 09:05:53