[英]Why isn't my regex working with str.contains?
I have a very simple search string.我有一个非常简单的搜索字符串。 I am looking for a shop called "Lidl".
我正在寻找一家名为“Lidl”的商店。
My dataframe:我的 dataframe:
term_location amount
0 Lidl 2.28
1 Lidl 16.97
2 Lidl 2.28
3 Lidl 16.97
4 Lidl 16.97
5 Lidl 16.97
6 Lidl 16.97
7 Lidl 16.97
8 Lidl 16.97
9 Lidl 16.97
Here I am searching for a regex version of Lidl:在这里,我正在寻找 Lidl 的正则表达式版本:
r = r'\blidl\b'
r = re.compile(r)
df[df.term_location.str.contains(r,re.IGNORECASE,na=False)]
This brings back an empty dataframe.这会带回一个空的 dataframe。
However if I just put the simple string in str.contains()
it works and I get the the dataframe of Lidls returned:但是,如果我只是将简单的字符串放在
str.contains()
中,它就可以工作,我得到 Lidls 的 dataframe 返回:
df[df.term_location.str.contains('lidl',case=False,na=False)]
I would prefer to be able to use regex, as I have a few more conditions to build into the query.我希望能够使用正则表达式,因为我还有一些条件可以构建到查询中。
So what's happening?那么发生了什么? I can't figure it out.
我想不通。
Practice dataframe for pd.DataFrame.from_dict()
:为
pd.DataFrame.from_dict()
练习 dataframe :
{'term_location': {0: 'Lidl',
1: 'Lidl',
2: 'Lidl',
3: 'Lidl',
4: 'Lidl',
5: 'Lidl',
6: 'Lidl',
7: 'Lidl',
8: 'Lidl',
9: 'Lidl'},
'amount': {0: 2.28,
1: 16.97,
2: 2.28,
3: 16.97,
4: 16.97,
5: 16.97,
6: 16.97,
7: 16.97,
8: 16.97,
9: 16.97}}
Your regular expression is not working because you are trying to match the word "lidl" exactly as it is (in lowercase).您的正则表达式不起作用,因为您试图完全匹配单词“lidl”(小写)。
You should either change the first character of the word to uppercase:您应该将单词的第一个字符更改为大写:
re.compile(r"\bLidl\b")
or use the re.IGNORECASE
flag in order to match the word regardless its case:或使用
re.IGNORECASE
标志来匹配单词,无论其大小写:
re.compile(r"\blidl\b", re.IGNORECASE)
Keep in mind that \b
tries to match the word in the beginning of the text.请记住,
\b
会尝试匹配文本开头的单词。 For example, "_Lidl" wouldn't match any of the regular expressions above.例如,“_Lidl”不会匹配上面的任何正则表达式。
Use string literal as pattern argument, it will be parsed as a regular expression:使用字符串文字作为模式参数,它将被解析为正则表达式:
df[df.term_location.str.contains(r'\blidl\b',case=False,na=False)]
^^^^^^^^^
The case=False
will act identically to re.IGNORECASE
. case=False
的作用与re.IGNORECASE
相同。
Alternatively, use (?i)
:或者,使用
(?i)
:
df[df.term_location.str.contains(r'(?i)\blidl\b',na=False)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.