简体   繁体   中英

Why \g<0> behaves differently than \0 in re.sub?

I'm using Python 3.3

re.sub("(.)(.)",r"\2\1\g<0>","ab")  returns baab

BUT

re.sub("(.)(.)",r"\2\1\0","ab")  returns ba

Is this a bug in the sub method or does the sub method not recognize \\0 on purpose for some reason?

As written on this page , the \\0 is interpreted as the null character ( \\x00 ) and group number start at 1 in Python (according to the re module documentation):

\\number

Matches the contents of the group of the same number. Groups are numbered starting from 1 . For example, (.+) \\1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group). This special sequence can only be used to match one of the first 99 groups. If the first digit of number is 0 , or number is 3 octal digits long, it will not be interpreted as a group match , but as the character with octal value number. Inside the '[' and ']' of a character class, all numeric escapes are treated as characters.

Also, according to the page previously linked, it's not a bug but a desired behaviour (this is obvious, since it's documented).

\\0 is interpreted as an escape for null \\x00 , and re does not recognize it as a capture group.

Reference:

Python Standard Library documentation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM