- 此模式被解释为正则表达式，并具有匹配组 - 但没有捕获组

Question

I'm migrating a script to a new python env, I don't like the regex I'd use \b instead, anyway I want to change as little as possible the existing code.我正在将脚本迁移到新的 python env，我不喜欢我会使用 \b 的正则表达式，无论如何我想尽可能少地更改现有代码。

I get this error executing the script:执行脚本时出现此错误：

UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  word_in_data = self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()

This is the row containing the regex:这是包含正则表达式的行：

self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()

It's using non capturing matching groups, (?:) why do I get this warning?它使用非捕获匹配组，(?:) 为什么我会收到此警告？

Thanks!谢谢！

Answer 1

If word contain () the warning is raised.如果word包含()则会引发警告。 Try to escape word试着逃避word

# Simple word
word = 'fractured'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

0     True
1    False
2    False
3     True
4    False
5     True
Name: text, dtype: bool

# Simple word with parenthesis
word = '(fractured)'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

0     True
1    False
2    False
3     True
4    False
5     True
Name: text, dtype: bool

# Simple word with parenthesis but escaped
word = '(fractured)'
word = re.escape(word)
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])"+word+r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

0    False
1    False
2    False
3    False
4    False
5    False
Name: text, dtype: bool

- 此模式被解释为正则表达式，并具有匹配组 - 但没有捕获组

问题描述

1 个解决方案

解决方案1
1 2023-01-21 08:04:00

- 此模式被解释为正则表达式，并具有匹配组 - 但没有捕获组

问题描述

1 个解决方案

解决方案1 1 2023-01-21 08:04:00

解决方案1
1 2023-01-21 08:04:00