[英]Pandas dataframe with MultiIndex: check if string is contained in index level
Let's say I have a multi-indexed pandas dataframe that looks like the following one, taken from the documentation . 假设我有一个多索引的pandas数据框,看起来像下面这个,取自文档 。
import numpy as np
import pandas as pd
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
Which looks like this: 看起来像这样:
0 1 2 3
bar one -0.096648 -0.080298 0.859359 -0.030288
two 0.043107 -0.431791 1.923893 -1.544845
baz one 0.639951 -0.008833 -0.227000 0.042315
two 0.705281 0.446257 -1.108522 0.471676
foo one -0.579483 -2.261138 -0.826789 1.543524
two -0.358526 1.416211 1.589617 0.284130
qux one 0.498149 -0.296404 0.127512 -0.224526
two -0.286687 -0.040473 1.443701 1.025008
Now I only want the rows where "ne" is contained in second level of the MultiIndex. 现在我只想要在MultiIndex的第二级中包含“ne”的行。
Is there any way to slice the MultiIndex for (partly) contained strings? 有没有办法为(部分)包含的字符串切片MultiIndex?
You can apply a mask like: 您可以应用如下掩码:
df = df.iloc[df.index.get_level_values(1).str.contains('ne')]
which returns: 返回:
bar one -0.143200 0.523617 0.376458 -2.091154
baz one -0.198220 1.234587 -0.232862 -0.510039
foo one -0.426127 0.594426 0.457331 -0.459682
qux one -0.875160 -0.157073 -0.540459 -1.792235
EDIT: It is possible also applying a logical mask on multiple levels, eg: 编辑:也可以在多个级别上应用逻辑掩码,例如:
df = df.iloc[(df.index.get_level_values(0).str.contains('ba')) | (df.index.get_level_values(1).str.contains('ne'))]
returns: 收益:
bar one 0.620279 1.525277 0.379649 -0.032608
two 0.465240 -0.190038 0.795730 1.720368
baz one 0.986828 -0.080394 -0.303319 0.747483
two 0.487534 1.597006 0.114551 0.299502
foo one -0.085700 0.112433 0.704043 0.264280
qux one -0.291758 -1.071669 0.794354 -1.805530
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.