简体   繁体   English

在 pandas 中使用多索引数据框进行索引

[英]Indexing with multiindex dataframe in pandas

Consider the following example data:考虑以下示例数据:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          740     389
           1          786     815
           2          178     265
           3          841     484
           4          211     534

So, How can I query the dataframe only with the second level index Patient only?那么,如何仅使用二级索引Patient查询数据框? For example, I'd like to know all the data with respect to Patient 2 .例如,我想知道关于Patient 2的所有数据。

I've tried data[data.index.get_level_values(1)==2] , and it worked fine.我试过data[data.index.get_level_values(1)==2] ,效果很好。 But is there any way to achieve the same with one these ( loc , iloc or ix ) indexing methods?但是有什么方法可以通过这些( locilocix )索引方法来实现相同的效果吗?

I think the simpliest is use xs : 我认为最简单的是使用xs

np.random.seed(100)
names = ['Taxon','Patient']
mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
print (df)
                    Tissue  Stool
Taxon      Patient               
Firmicutes 1             8      8
           2             3      7
           3             7      0
           4             4      2
           5             5      2
another    1             2      2
           2             1      0
           3             8      4
           4             0      9
           5             6      2

print (df.xs(2, level=1))
            Tissue  Stool
Taxon                    
Firmicutes       3      7
another          1      0

#if need also level Patient
print (df.xs(2, level=1, drop_level=False))
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

Solution with loc - is possible specify axis : loc解决方案-可以指定axis

print (df.loc(axis=0)[:,2])
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

Yes, use pd.IndexSlice which is exactly what you are looking for. 是的,使用正是您要查找的pd.IndexSlice See the documentation here . 请参阅此处的文档。

Some dummy data: 一些伪数据:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          158     137
           1          697     980
           2          751     759
           3          171     556
           4          701     620

You can write it explicitly like: 您可以像这样明确地编写它:

df.loc[(slice(None), 2), :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

Or you may use the more readable pd.IndexSlice: 或者,您可以使用更具可读性的pd.IndexSlice:

idx = pd.IndexSlice
df.loc[idx[:, 2], :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM