简体   繁体   English

正则表达式 - 匹配范围

[英]Regular expression - range of the match

I have the following regular expression: 我有以下正则表达式:

re.findall(r'(\b[A-Za-z][a-z]{3,10}\b)', string_var)

I expected that this regular expression will return matches with the length ranging from 3 to 10 . 我希望这个正则表达式将返回长度范围为310匹配项。 It however returns matches for words ranging in length from 4 to 11 . 然而,它返回长度为411单词的匹配。

Do we thus read the above regular expression as matching those words which start with an upper case or lower case letter, followed by letters ranging in length from 3 to 10 ? 因此,我们是否将上述正则表达式视为匹配以大写或小写字母开头的单词,后跟长度为310字母? In other words, having the first letter as the extra letter which extended the range? 换句话说,将第一个字母作为扩展范围的额外字母?

Thanks. 谢谢。

Yes. 是。

Your regex is 你的正则表达式是

(\b[A-Za-z][a-z]{3,10}\b)

Now, the grouping parens don't affect the match, so we can ignore them. 现在,分组的parens不会影响匹配,所以我们可以忽略它们。 And the \\b is a "zero-width" matching operator - it matches a transition from one character class to another - so it doesn't actually correspond to any characters. \\b是一个“零宽度”匹配运算符 - 它匹配从一个字符类到另一个字符类的转换 - 因此它实际上并不对应于任何字符。 We can ignore them. 我们可以忽略它们。 That leaves this: 这留下了这个:

[A-Za-z][a-z]{3,10}

This is two character classes, with a repetition specifier suffix on the second: 这是两个字符类,第二个是重复说明符后缀:

  1. [A-Za-z] - matches one character, upper or lower case Latin alphabetic. [A-Za-z] - 匹配一个字符,大写或小写拉丁字母。

  2. [az]{3,10} - matches at least 3, at most 10 characters, lowercase az [az] {3,10} - 匹配至少3个,最多10个字符,小写az

So in total, you are matching 1 + [3,10] character. 所以总的来说,你匹配1 + [3,10]个字符。 Your minimal match will be 4 characters, and your maximal match will be 11. 您的最小匹配将是4个字符,您的最大匹配将是11。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM