简体   繁体   中英

Why isn't my regex working with str.contains?

I have a very simple search string. I am looking for a shop called "Lidl".

My dataframe:

  term_location  amount
0          Lidl    2.28
1          Lidl   16.97
2          Lidl    2.28
3          Lidl   16.97
4          Lidl   16.97
5          Lidl   16.97
6          Lidl   16.97
7          Lidl   16.97
8          Lidl   16.97
9          Lidl   16.97

Here I am searching for a regex version of Lidl:

r = r'\blidl\b'

r = re.compile(r)


df[df.term_location.str.contains(r,re.IGNORECASE,na=False)]

This brings back an empty dataframe.

However if I just put the simple string in str.contains() it works and I get the the dataframe of Lidls returned:

df[df.term_location.str.contains('lidl',case=False,na=False)]

I would prefer to be able to use regex, as I have a few more conditions to build into the query.

So what's happening? I can't figure it out.

Practice dataframe for pd.DataFrame.from_dict() :

{'term_location': {0: 'Lidl',
  1: 'Lidl',
  2: 'Lidl',
  3: 'Lidl',
  4: 'Lidl',
  5: 'Lidl',
  6: 'Lidl',
  7: 'Lidl',
  8: 'Lidl',
  9: 'Lidl'},
 'amount': {0: 2.28,
  1: 16.97,
  2: 2.28,
  3: 16.97,
  4: 16.97,
  5: 16.97,
  6: 16.97,
  7: 16.97,
  8: 16.97,
  9: 16.97}}

Your regular expression is not working because you are trying to match the word "lidl" exactly as it is (in lowercase).

You should either change the first character of the word to uppercase:

re.compile(r"\bLidl\b")

or use the re.IGNORECASE flag in order to match the word regardless its case:

re.compile(r"\blidl\b", re.IGNORECASE)

Keep in mind that \b tries to match the word in the beginning of the text. For example, "_Lidl" wouldn't match any of the regular expressions above.

Use string literal as pattern argument, it will be parsed as a regular expression:

df[df.term_location.str.contains(r'\blidl\b',case=False,na=False)]
                                   ^^^^^^^^^ 

The case=False will act identically to re.IGNORECASE .

Alternatively, use (?i) :

df[df.term_location.str.contains(r'(?i)\blidl\b',na=False)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM