[英]Why inconsistent regular expression “\bpattern\b” behavior in Python?
I am using Python 3 to demonstrate. 我正在使用Python 3进行演示。 There is an example string:
有一个示例字符串:
a = "learning is learn and elearn" a =“学习就是学习和学习”
s = "@wen is @ and wen@" s =“ @wen是@和wen @”
I want to do exact match of "learn" and "@", ie, not extracting learning (or @wen) or elearn (or wen@). 我想精确匹配“学习”和“ @”,即不提取学习(或@wen)或elearn(或wen @)。 Therefore, I should get 'learn' and '@'.
因此,我应该得到“学习”和“ @”。
re.findall(r'\blearn\b', a) # works
['learn']
or
re.sub(r'\blearn\b', 'z', a) # works
'learning is z and elearn'
re.findall(r'\b@\b', s) # not working
[]
or
re.sub(r'\b@\b', 'z', s) # not working
'@wen is @ and wen@'
\\b
Matches the empty string, but only at the beginning or end of a word.\\b
匹配空字符串,但仅匹配单词的开头或结尾。 A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character.单词定义为字母数字或下划线字符的序列,因此单词的结尾由空格或非字母数字的非下划线字符指示。 Note that formally,
\\b
is defined as the boundary between a\\w
and a\\W
character (or vice versa), or between\\w
and the beginning/end of the string请注意,形式上,
\\b
定义为\\w
和\\W
字符之间的边界(反之亦然)或\\w
与字符串的开头/结尾之间的边界
In your example, @
is a nonalphanumeric (and non-underscore) character surrounded by other nonalphanumeric characters. 在您的示例中,
@
是由其他非字母数字字符包围的非字母数字(和非下划线)字符。 Because there are no word characters, there is no word boundary, so \\b
will not match. 因为没有单词字符,所以没有单词边界,所以
\\b
将不匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.