简体   繁体   English

正则表达式中的圆括号是什么意思?

[英]What do round brackets in Regex mean?

I don't understand why the regex ^(.)+$ matches the last letter of a string.我不明白为什么正则表达式^(.)+$匹配字符串的最后一个字母。 I thought it would match the whole string.我认为它会匹配整个字符串。

Example in Python: Python 中的示例:

>>> text = 'This is a sentence'
>>> re.findall('^(.)+$', text)
['e']

If there's a capturing group (or groups), re.findall returns differently:如果有一个(或多个)捕获组,则re.findall不同的re.findall返回:

If one or more groups are present in the pattern, return a list of groups;如果模式中存在一个或多个组,则返回组列表; this will be a list of tuples if the pattern has more than one group.如果模式有多个组,这将是一个元组列表。 Empty matches are included in the result unless they touch the beginning of another match.空匹配项包含在结果中,除非它们触及另一个匹配项的开头。


And according to MatchObject.group documentation :根据MatchObject.group文档

If a group matches multiple times, only the last match is accessible :如果一个组匹配多次,则只能访问最后一个匹配

If you want to get whole string, use a non-capturing group:如果要获取整个字符串,请使用非捕获组:

>>> re.findall('^(?:.)+$', text)
['This is a sentence']

or don't use capturing groups at all:或者根本不使用捕获组:

>>> re.findall('^.+$', text)
['This is a sentence']

or change the group to capturing all:或将组更改为捕获所有:

>>> re.findall('^(.+)$', text)
['This is a sentence']
>>> re.findall('(^.+$)', text)
['This is a sentence']

Alternatively, you can use re.finditer which yield match objects.或者,您可以使用re.finditer产生匹配对象。 Using MatchObject.group() , you can get the whole matched string:使用MatchObject.group() ,您可以获得整个匹配的字符串:

>>> [m.group() for m in re.finditer('^(.)+$', text)]
['This is a sentence']

Because the capture group is just one character (.) .因为捕获组只有一个字符(.) The regex engine will continue to match the whole string because of the + quantifier, and each time, the capture group will be updated to the latest match.由于+量词,正则表达式引擎将继续匹配整个字符串,并且每次都会将捕获组更新为最新匹配。 In the end, the capture group will be the last character.最后,捕获组将是最后一个字符。

Even if you use findall , the first time the regex is applied, because of the + quantifier it will continue to match the whole string up to the end.即使您使用findall ,第一次应用正则表达式时,由于+量词,它将继续匹配整个字符串直到结束。 And since the end of the string was reached, the regex won't be applied again, and the call returns just one result.并且由于到达字符串的末尾,将不会再次应用正则表达式,并且调用仅返回一个结果。

If you remove the + quantifier, then the first time, the regex will match just one character, so the regex will be applied again and again, until the whole string will be consumed, and findall will return a list of all the characters in the string.如果删除+量词,那么第一次,正则表达式将只匹配一个字符,因此正则表达式将一次又一次地应用,直到整个字符串都被消耗掉,而findall将返回列表中所有字符的列表细绳。

NOte that + is greedy by default which matches all the characters upto the last.请注意,默认情况下+是贪婪的,它匹配直到最后一个字符的所有字符。 Since only the dot present inside the capturing group, the above regex matches all the characters from the start but captures only the last character.由于捕获组中只存在点,因此上面的正则表达式从头开始匹配所有字符,但只捕获最后一个字符。 Since findall function gives the first preference to groups, it just prints out the chars present inside the groups.由于findall函数优先考虑组,它只打印出组内的字符。

re.findall('^(.+)$', text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM