简体   繁体   中英

"re.sub" method with ".*"

I was using python re library and came across the following behavior.

>>> import re
>>> re.sub(pattern=".*", repl="r", string="hello")
'rr'

As you can see, for the pattern .* and the replacement character( r ) re.sub method returning rr . But I was expecting the result as r because .* would match the entire string. Why is that?. I have also tested the same logic in Go but it was returning expected result.

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`.*`)
    fmt.Println(re.ReplaceAllString("Hello", "r")) // Will print `r`
}

The following should start explaining what's going on:

>>> re.sub("x?", "_", "hello")
'_h_e_l_l_o_'

At every position in the string re.sub tries to match x? . It succeeds, because x? can match the empty string, and replaces the empty string with _ .

In a similar fashion, in the following

>>> re.sub(".*", "r", "hello")
'rr'

we have that re.sub tries to match .* in position 0, succeeds, and consumes the whole string. Then it tries to match at the end position, succeeds (matching the empty string) and replaces it with r again. The 'puzzling' behavior goes away if you disallow the empty match:

>>> re.sub(".+", "r", "hello")
'r'

In versions prior to Python 3.7 if re.sub consumed the whole string it would then not try to match at the end again, whereas in Python 3.7+ it does. To be more specific, quoting the documentation of re.sub :

Changed in version 3.7 : Empty matches for the pattern are replaced when adjacent to a previous non-empty match.

Python 3.7+ (consistent behavior)

>>> matches = lambda r, s: [m.span() for m in re.finditer(r, s)]
>>> matches("x?", "x")
[(0, 1), (1, 1)]
>>> matches("x?", "y")
[(0, 0), (1, 1)]
>>> re.sub("x?", "r", "x")
'rr'
>>> re.sub("x?", "r", "y")
'ryr

Python 3.6 (inconsistent behavior)

>>> matches("x?", "x")
[(0, 1), (1, 1)]
>>> matches("x?", "y")
[(0, 0), (1, 1)]
>>> re.sub("x?", "r", "x")
'r'
>>> re.sub("x?", "r", "y")
'ryr'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM