[英]Indexing with multiindex dataframe in pandas
Consider the following example data:考虑以下示例数据:
data = {"Taxon": ["Firmicutes"]*5,
"Patient": range(5),
"Tissue": np.random.randint(0, 1000, size=5),
"Stool": np.random.randint(0, 1000, size=5)}
df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)
Stool Tissue
Taxon Patient
Firmicutes 0 740 389
1 786 815
2 178 265
3 841 484
4 211 534
So, How can I query the dataframe only with the second level index Patient
only?那么,如何仅使用二级索引
Patient
查询数据框? For example, I'd like to know all the data with respect to Patient 2
.例如,我想知道关于
Patient 2
的所有数据。
I've tried data[data.index.get_level_values(1)==2]
, and it worked fine.我试过
data[data.index.get_level_values(1)==2]
,效果很好。 But is there any way to achieve the same with one these ( loc
, iloc
or ix
) indexing methods?但是有什么方法可以通过这些(
loc
、 iloc
或ix
)索引方法来实现相同的效果吗?
I think the simpliest is use xs
: 我认为最简单的是使用
xs
:
np.random.seed(100)
names = ['Taxon','Patient']
mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
print (df)
Tissue Stool
Taxon Patient
Firmicutes 1 8 8
2 3 7
3 7 0
4 4 2
5 5 2
another 1 2 2
2 1 0
3 8 4
4 0 9
5 6 2
print (df.xs(2, level=1))
Tissue Stool
Taxon
Firmicutes 3 7
another 1 0
#if need also level Patient
print (df.xs(2, level=1, drop_level=False))
Tissue Stool
Taxon Patient
Firmicutes 2 3 7
another 2 1 0
Solution with loc
- is possible specify axis
: loc
解决方案-可以指定axis
:
print (df.loc(axis=0)[:,2])
Tissue Stool
Taxon Patient
Firmicutes 2 3 7
another 2 1 0
Yes, use pd.IndexSlice
which is exactly what you are looking for. 是的,使用正是您要查找的
pd.IndexSlice
。 See the documentation here . 请参阅此处的文档。
Some dummy data: 一些伪数据:
data = {"Taxon": ["Firmicutes"]*5,
"Patient": range(5),
"Tissue": np.random.randint(0, 1000, size=5),
"Stool": np.random.randint(0, 1000, size=5)}
df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)
Stool Tissue
Taxon Patient
Firmicutes 0 158 137
1 697 980
2 751 759
3 171 556
4 701 620
You can write it explicitly like: 您可以像这样明确地编写它:
df.loc[(slice(None), 2), :]
Stool Tissue
Taxon Patient
Firmicutes 2 751 759
Or you may use the more readable pd.IndexSlice: 或者,您可以使用更具可读性的pd.IndexSlice:
idx = pd.IndexSlice
df.loc[idx[:, 2], :]
Stool Tissue
Taxon Patient
Firmicutes 2 751 759
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.