[英]Subset pandas dataframe using regex
I have a pandas dataframe that looks like : 我有一个熊猫数据框,看起来像:
>>> df
product desc
0 ABCD desc1
1 ABCD1,XYZ desc2
2 ABCD1H desc3
3 ABCD1 desc4
4 ABCD1H,LMN desc5
I want to filter out rows that have products ABCD1
or ABCD1 followed by any other product ID
but not ABCD1H
. 我想过滤出包含产品
ABCD1
或ABCD1 followed by any other product ID
而不是ABCD1H
。 How to filter out such rows. 如何过滤出这样的行。 In the above example , I want the output as :
在上面的示例中,我希望输出为:
>>> df
product desc
1 ABCD1,XYZ desc2
3 ABCD1 desc4
This is what I have tried so far but that does not work . 到目前为止,这是我尝试过的方法,但是没有用。
df2 = df.loc[df['product'].str.contains('ABCD1')]
It also includes ABCD1H
in its results, i don't want that to happen. 结果中也包括
ABCD1H
,我不希望这种情况发生。
Use regex "\\b" is word break: 使用正则表达式“ \\ b”是分词符:
df[df['product'].str.contains(r'ABCD1\b')]
Output: 输出:
product desc
1 ABCD1,XYZ desc2
3 ABCD1 desc4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.