简体   繁体   中英

Is this result for regular expression backreferencing correct?

I used Javascript in the command line client of MongoDB v2.2.4 to run the following regular expression backreferencing:

> /([AB])([AB])/("BA")
[ "BA", "B", "A" ]

I had thought I should get ["B","A"] but I got an extra element "BA" at the beginning of the array. I tried the same regular expression backreferencing in Python, the returning results is what I expected as follows:

>>> re.search('([AB])([AB])','BA').groups()
('B', 'A')

So, may I say the result of the regular expression backreferencing from Javascript in MongoDB is wrong?

在JavaScript(以及许多其他Regex引擎)中,组0被视为整个输入,而匹配组从1开始。在Python的re模块中,组从0开始,因为整个字符串都是您的输入。

The MongoDB result includes the whole matched string, or group 0, as well as groups 1 and 2.

The Python .groups() method only returns captured groups. The .group() method would, without an argument, return group 0 too:

>>> re.search('([AB])([AB])', 'BA').groups()
('B', 'A')
>>> re.search('([AB])([AB])', 'BA').group()
'BA'
>>> re.search('([AB])([AB])', 'BA').group(1)
'B'
>>> re.search('([AB])([AB])', 'BA').group(2)
'A'
>>> re.search('([AB])([AB])', 'BA').group(0)
'BA'

This is documented in the re module documentation :

Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.

and for the .group() method:

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned).

Note that there are no back-references in your expression. A back-reference would look like this instead:

'([AB])\1'

where the \\1 refers to the capturing group just before it. The back-reference will only match the exact same characters that the referenced group matched.

Demo:

>>> re.search(r'([AB])\1', 'BA')
>>> re.search(r'([AB])\1', 'BB')
<_sre.SRE_Match object at 0x107098210>

Note how only BB is matched, not BA .

You can use named groups too:

'(?P<a_or_b>[AB])(?P=a_or_b)'

where a_or_b is the group name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM