I have a very simple search string. I am looking for a shop called "Lidl".
My dataframe:
term_location amount
0 Lidl 2.28
1 Lidl 16.97
2 Lidl 2.28
3 Lidl 16.97
4 Lidl 16.97
5 Lidl 16.97
6 Lidl 16.97
7 Lidl 16.97
8 Lidl 16.97
9 Lidl 16.97
Here I am searching for a regex version of Lidl:
r = r'\blidl\b'
r = re.compile(r)
df[df.term_location.str.contains(r,re.IGNORECASE,na=False)]
This brings back an empty dataframe.
However if I just put the simple string in str.contains()
it works and I get the the dataframe of Lidls returned:
df[df.term_location.str.contains('lidl',case=False,na=False)]
I would prefer to be able to use regex, as I have a few more conditions to build into the query.
So what's happening? I can't figure it out.
Practice dataframe for pd.DataFrame.from_dict()
:
{'term_location': {0: 'Lidl',
1: 'Lidl',
2: 'Lidl',
3: 'Lidl',
4: 'Lidl',
5: 'Lidl',
6: 'Lidl',
7: 'Lidl',
8: 'Lidl',
9: 'Lidl'},
'amount': {0: 2.28,
1: 16.97,
2: 2.28,
3: 16.97,
4: 16.97,
5: 16.97,
6: 16.97,
7: 16.97,
8: 16.97,
9: 16.97}}
Your regular expression is not working because you are trying to match the word "lidl" exactly as it is (in lowercase).
You should either change the first character of the word to uppercase:
re.compile(r"\bLidl\b")
or use the re.IGNORECASE
flag in order to match the word regardless its case:
re.compile(r"\blidl\b", re.IGNORECASE)
Keep in mind that \b
tries to match the word in the beginning of the text. For example, "_Lidl" wouldn't match any of the regular expressions above.
Use string literal as pattern argument, it will be parsed as a regular expression:
df[df.term_location.str.contains(r'\blidl\b',case=False,na=False)]
^^^^^^^^^
The case=False
will act identically to re.IGNORECASE
.
Alternatively, use (?i)
:
df[df.term_location.str.contains(r'(?i)\blidl\b',na=False)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.