[英]- This pattern is interpreted as a regular expression, and has match groups - but with no capturing group
I'm migrating a script to a new python env, I don't like the regex I'd use \b instead, anyway I want to change as little as possible the existing code.我正在将脚本迁移到新的 python env,我不喜欢我会使用 \b 的正则表达式,无论如何我想尽可能少地更改现有代码。
I get this error executing the script:执行脚本时出现此错误:
UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
word_in_data = self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()
This is the row containing the regex:这是包含正则表达式的行:
self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()
It's using non capturing matching groups, (?:) why do I get this warning?它使用非捕获匹配组,(?:) 为什么我会收到此警告?
Thanks!谢谢!
If word
contain ()
the warning is raised.如果word
包含()
则会引发警告。 Try to escape word
试着逃避word
# Simple word
word = 'fractured'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
0 True
1 False
2 False
3 True
4 False
5 True
Name: text, dtype: bool
# Simple word with parenthesis
word = '(fractured)'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
0 True
1 False
2 False
3 True
4 False
5 True
Name: text, dtype: bool
# Simple word with parenthesis but escaped
word = '(fractured)'
word = re.escape(word)
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
0 False
1 False
2 False
3 False
4 False
5 False
Name: text, dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.