[英]"re.sub" method with ".*"
I was using python re
library and came across the following behavior.我正在使用 python re
库并遇到以下行为。
>>> import re
>>> re.sub(pattern=".*", repl="r", string="hello")
'rr'
As you can see, for the pattern .*
and the replacement character( r
) re.sub
method returning rr
.如您所见,对于模式.*
和替换字符 ( r
), re.sub
方法返回rr
。 But I was expecting the result as r
because .*
would match the entire string.但我期望结果为r
因为.*
会匹配整个字符串。 Why is that?.这是为什么?。 I have also tested the same logic in Go but it was returning expected result.我还在 Go 中测试了相同的逻辑,但它返回了预期的结果。
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`.*`)
fmt.Println(re.ReplaceAllString("Hello", "r")) // Will print `r`
}
The following should start explaining what's going on:下面应该开始解释发生了什么:
>>> re.sub("x?", "_", "hello")
'_h_e_l_l_o_'
At every position in the string re.sub
tries to match x?
在字符串re.sub
中的每个 position 尝试匹配x?
. . It succeeds, because x?
它成功了,因为x?
can match the empty string, and replaces the empty string with _
.可以匹配空字符串,并用_
替换空字符串。
In a similar fashion, in the following以类似的方式,在下面
>>> re.sub(".*", "r", "hello")
'rr'
we have that re.sub
tries to match .*
in position 0, succeeds, and consumes the whole string.我们有re.sub
尝试匹配 position 0 中的.*
,成功并消耗了整个字符串。 Then it tries to match at the end position, succeeds (matching the empty string) and replaces it with r
again.然后它尝试在末尾匹配 position,成功(匹配空字符串)并再次将其替换为r
。 The 'puzzling' behavior goes away if you disallow the empty match:如果您不允许空匹配,“令人费解”的行为就会消失:
>>> re.sub(".+", "r", "hello")
'r'
In versions prior to Python 3.7 if re.sub
consumed the whole string it would then not try to match at the end again, whereas in Python 3.7+ it does.在 Python 3.7 之前的版本中,如果re.sub
消耗了整个字符串,它将不会再次尝试匹配末尾,而在 Python 3.7+ 中它会。 To be more specific, quoting the documentation of re.sub
:更具体地说,引用re.sub
的文档:
Changed in version 3.7 : Empty matches for the pattern are replaced when adjacent to a previous non-empty match. 在 3.7 版更改:当与先前的非空匹配相邻时,模式的空匹配将被替换。
>>> matches = lambda r, s: [m.span() for m in re.finditer(r, s)]
>>> matches("x?", "x")
[(0, 1), (1, 1)]
>>> matches("x?", "y")
[(0, 0), (1, 1)]
>>> re.sub("x?", "r", "x")
'rr'
>>> re.sub("x?", "r", "y")
'ryr
>>> matches("x?", "x")
[(0, 1), (1, 1)]
>>> matches("x?", "y")
[(0, 0), (1, 1)]
>>> re.sub("x?", "r", "x")
'r'
>>> re.sub("x?", "r", "y")
'ryr'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.