使用正则表达式子集熊猫数据框

Question

I have a pandas dataframe that looks like : 我有一个熊猫数据框，看起来像：

>>> df
      product   desc
0        ABCD  desc1
1   ABCD1,XYZ  desc2
2      ABCD1H  desc3
3       ABCD1  desc4
4  ABCD1H,LMN  desc5

I want to filter out rows that have products ABCD1 or ABCD1 followed by any other product ID but not ABCD1H . 我想过滤出包含产品ABCD1或ABCD1 followed by any other product ID而不是ABCD1H 。 How to filter out such rows. 如何过滤出这样的行。 In the above example , I want the output as : 在上面的示例中，我希望输出为：

>>> df
          product   desc
    1   ABCD1,XYZ  desc2
    3       ABCD1  desc4

This is what I have tried so far but that does not work . 到目前为止，这是我尝试过的方法，但是没有用。

df2 = df.loc[df['product'].str.contains('ABCD1')]

It also includes ABCD1H in its results, i don't want that to happen. 结果中也包括ABCD1H ，我不希望这种情况发生。

Answer 1

Use regex "\\b" is word break: 使用正则表达式“ \\ b”是分词符：

df[df['product'].str.contains(r'ABCD1\b')]

Output: 输出：

     product   desc
1  ABCD1,XYZ  desc2
3      ABCD1  desc4

使用正则表达式子集熊猫数据框

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-08-07 18:26:46

使用正则表达式子集熊猫数据框

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-08-07 18:26:46

解决方案1
2 已采纳 2019-08-07 18:26:46