pandas.Series.str.contains() is not finding a string which exists in the Series

Question

I'm trying to match a bunch of names from a list to the names in one of the columns of a Pandas DataFrame. A small part of the DataFrame is shown below:

The values in the columns "Object ID" had some whitespace which I stripped using the line:

df["Object ID"] = df["Object ID"].str.strip()

I am searching the column "Object ID" using the following line:

df[df["Object ID"].str.contains('EM* LkHA 115') == True]

The above line is returning an empty dataframe eventhough 'EM* LkHA 115' exists in the dataframe as shown below:

Any idea what I could be doing wrong? I would be happy to provide any further information if it would be of help.

Thanks in advance !

Answer 1

You have to escape the '*' char.

df[df["Object ID"].str.contains('EM\* LkHA 115')]

also you don't need the == True

As @MustafaAydın says in the comment below you can use the regex lib to do this dynamically.

import re

df[df["Object ID"].str.contains(re.escape('EM* LkHA 115'))]

pandas.Series.str.contains() is not finding a string which exists in the Series

Question

1 answers

solution1
2 ACCPTED 2021-06-11 12:03:10

pandas.Series.str.contains() is not finding a string which exists in the Series

Question

1 answers

solution1 2 ACCPTED 2021-06-11 12:03:10

solution1
2 ACCPTED 2021-06-11 12:03:10