[英]Regular expression - range of the match
I have the following regular expression: 我有以下正则表达式:
re.findall(r'(\b[A-Za-z][a-z]{3,10}\b)', string_var)
I expected that this regular expression will return matches with the length ranging from 3
to 10
. 我希望这个正则表达式将返回长度范围为3
到10
匹配项。 It however returns matches for words ranging in length from 4
to 11
. 然而,它返回长度为4
到11
单词的匹配。
Do we thus read the above regular expression as matching those words which start with an upper case or lower case letter, followed by letters ranging in length from 3
to 10
? 因此,我们是否将上述正则表达式视为匹配以大写或小写字母开头的单词,后跟长度为3
到10
字母? In other words, having the first letter as the extra letter which extended the range? 换句话说,将第一个字母作为扩展范围的额外字母?
Thanks. 谢谢。
Yes. 是。
Your regex is 你的正则表达式是
(\b[A-Za-z][a-z]{3,10}\b)
Now, the grouping parens don't affect the match, so we can ignore them. 现在,分组的parens不会影响匹配,所以我们可以忽略它们。 And the \\b
is a "zero-width" matching operator - it matches a transition from one character class to another - so it doesn't actually correspond to any characters. \\b
是一个“零宽度”匹配运算符 - 它匹配从一个字符类到另一个字符类的转换 - 因此它实际上并不对应于任何字符。 We can ignore them. 我们可以忽略它们。 That leaves this: 这留下了这个:
[A-Za-z][a-z]{3,10}
This is two character classes, with a repetition specifier suffix on the second: 这是两个字符类,第二个是重复说明符后缀:
[A-Za-z] - matches one character, upper or lower case Latin alphabetic. [A-Za-z] - 匹配一个字符,大写或小写拉丁字母。
[az]{3,10} - matches at least 3, at most 10 characters, lowercase az [az] {3,10} - 匹配至少3个,最多10个字符,小写az
So in total, you are matching 1 + [3,10] character. 所以总的来说,你匹配1 + [3,10]个字符。 Your minimal match will be 4 characters, and your maximal match will be 11. 您的最小匹配将是4个字符,您的最大匹配将是11。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.