简体   繁体   English

为什么Python中的正则表达式“ \\ bpattern \\ b”行为不一致?

[英]Why inconsistent regular expression “\bpattern\b” behavior in Python?

I am using Python 3 to demonstrate. 我正在使用Python 3进行演示。 There is an example string: 有一个示例字符串:

a = "learning is learn and elearn" a =“学习就是学习和学习”

s = "@wen is @ and wen@" s =“ @wen是@和wen @”

I want to do exact match of "learn" and "@", ie, not extracting learning (or @wen) or elearn (or wen@). 我想精确匹配“学习”和“ @”,即不提取学习(或@wen)或elearn(或wen @)。 Therefore, I should get 'learn' and '@'. 因此,我应该得到“学习”和“ @”。

re.findall(r'\blearn\b', a) # works
['learn']

or

re.sub(r'\blearn\b', 'z', a) # works
'learning is z and elearn'


re.findall(r'\b@\b', s) # not working
[]

or

re.sub(r'\b@\b', 'z', s) # not working
'@wen is @ and wen@'

From the docs : 文档

\\b Matches the empty string, but only at the beginning or end of a word. \\b匹配空字符串,但仅匹配单词的开头或结尾。 A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. 单词定义为字母数字或下划线字符的序列,因此单词的结尾由空格或非字母数字的非下划线字符指示。 Note that formally, \\b is defined as the boundary between a \\w and a \\W character (or vice versa), or between \\w and the beginning/end of the string 请注意,形式上, \\b定义为\\w\\W字符之间的边界(反之亦然)或\\w与字符串的开头/结尾之间的边界

In your example, @ is a nonalphanumeric (and non-underscore) character surrounded by other nonalphanumeric characters. 在您的示例中, @是由其他非字母数字字符包围的非字母数字(和非下划线)字符。 Because there are no word characters, there is no word boundary, so \\b will not match. 因为没有单词字符,所以没有单词边界,所以\\b将不匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM