为什么Python中的正则表达式“ \\ bpattern \\ b”行为不一致？

Question

I am using Python 3 to demonstrate. 我正在使用Python 3进行演示。 There is an example string: 有一个示例字符串：

a = "learning is learn and elearn" a =“学习就是学习和学习”

s = "@wen is @ and wen@" s =“ @wen是@和wen @”

I want to do exact match of "learn" and "@", ie, not extracting learning (or @wen) or elearn (or wen@). 我想精确匹配“学习”和“ @”，即不提取学习（或@wen）或elearn（或wen @）。 Therefore, I should get 'learn' and '@'. 因此，我应该得到“学习”和“ @”。

re.findall(r'\blearn\b', a) # works
['learn']

or

re.sub(r'\blearn\b', 'z', a) # works
'learning is z and elearn'


re.findall(r'\b@\b', s) # not working
[]

or

re.sub(r'\b@\b', 'z', s) # not working
'@wen is @ and wen@'

Answer 1

From the docs : 从文档：

\\b Matches the empty string, but only at the beginning or end of a word. \\b匹配空字符串，但仅匹配单词的开头或结尾。 A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. 单词定义为字母数字或下划线字符的序列，因此单词的结尾由空格或非字母数字的非下划线字符指示。 Note that formally, \\b is defined as the boundary between a \\w and a \\W character (or vice versa), or between \\w and the beginning/end of the string 请注意，形式上， \\b定义为\\w和\\W字符之间的边界（反之亦然）或\\w与字符串的开头/结尾之间的边界

In your example, @ is a nonalphanumeric (and non-underscore) character surrounded by other nonalphanumeric characters. 在您的示例中， @是由其他非字母数字字符包围的非字母数字（和非下划线）字符。 Because there are no word characters, there is no word boundary, so \\b will not match. 因为没有单词字符，所以没有单词边界，所以\\b将不匹配。

为什么Python中的正则表达式“ \\ bpattern \\ b”行为不一致？

问题描述

1 个解决方案

解决方案1
2 2016-02-24 19:05:06

为什么Python中的正则表达式“ \\ bpattern \\ b”行为不一致？

问题描述

1 个解决方案

解决方案1 2 2016-02-24 19:05:06

解决方案1
2 2016-02-24 19:05:06