简体   繁体   English

为什么我的正则表达式不能与 str.contains 一起使用?

[英]Why isn't my regex working with str.contains?

I have a very simple search string.我有一个非常简单的搜索字符串。 I am looking for a shop called "Lidl".我正在寻找一家名为“Lidl”的商店。

My dataframe:我的 dataframe:

  term_location  amount
0          Lidl    2.28
1          Lidl   16.97
2          Lidl    2.28
3          Lidl   16.97
4          Lidl   16.97
5          Lidl   16.97
6          Lidl   16.97
7          Lidl   16.97
8          Lidl   16.97
9          Lidl   16.97

Here I am searching for a regex version of Lidl:在这里,我正在寻找 Lidl 的正则表达式版本:

r = r'\blidl\b'

r = re.compile(r)


df[df.term_location.str.contains(r,re.IGNORECASE,na=False)]

This brings back an empty dataframe.这会带回一个空的 dataframe。

However if I just put the simple string in str.contains() it works and I get the the dataframe of Lidls returned:但是,如果我只是将简单的字符串放在str.contains()中,它就可以工作,我得到 Lidls 的 dataframe 返回:

df[df.term_location.str.contains('lidl',case=False,na=False)]

I would prefer to be able to use regex, as I have a few more conditions to build into the query.我希望能够使用正则表达式,因为我还有一些条件可以构建到查询中。

So what's happening?那么发生了什么? I can't figure it out.我想不通。

Practice dataframe for pd.DataFrame.from_dict() :pd.DataFrame.from_dict()练习 dataframe :

{'term_location': {0: 'Lidl',
  1: 'Lidl',
  2: 'Lidl',
  3: 'Lidl',
  4: 'Lidl',
  5: 'Lidl',
  6: 'Lidl',
  7: 'Lidl',
  8: 'Lidl',
  9: 'Lidl'},
 'amount': {0: 2.28,
  1: 16.97,
  2: 2.28,
  3: 16.97,
  4: 16.97,
  5: 16.97,
  6: 16.97,
  7: 16.97,
  8: 16.97,
  9: 16.97}}

Your regular expression is not working because you are trying to match the word "lidl" exactly as it is (in lowercase).您的正则表达式不起作用,因为您试图完全匹配单词“lidl”(小写)。

You should either change the first character of the word to uppercase:您应该将单词的第一个字符更改为大写:

re.compile(r"\bLidl\b")

or use the re.IGNORECASE flag in order to match the word regardless its case:或使用re.IGNORECASE标志来匹配单词,无论其大小写:

re.compile(r"\blidl\b", re.IGNORECASE)

Keep in mind that \b tries to match the word in the beginning of the text.请记住, \b会尝试匹配文本开头的单词。 For example, "_Lidl" wouldn't match any of the regular expressions above.例如,“_Lidl”不会匹配上面的任何正则表达式。

Use string literal as pattern argument, it will be parsed as a regular expression:使用字符串文字作为模式参数,它将被解析为正则表达式:

df[df.term_location.str.contains(r'\blidl\b',case=False,na=False)]
                                   ^^^^^^^^^ 

The case=False will act identically to re.IGNORECASE . case=False的作用与re.IGNORECASE相同。

Alternatively, use (?i) :或者,使用(?i)

df[df.term_location.str.contains(r'(?i)\blidl\b',na=False)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM