简体   繁体   English

re.sub(".*", ", "(replacement)", "text") 在 Python 上加倍替换 3.7

[英]re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .* gives the input string repeated twice!在 Python 3.7(在 Windows 64 位上测试),使用 RegEx .*替换字符串会使输入字符串重复两次!

On Python 3.7.2:在 Python 3.7.2 上:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)(replacement)'

On Python 3.6.4:在 Python 3.6.4 上:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

On Python 2.7.5 (32 bits):在 Python 2.7.5(32 位)上:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

What is wrong?怎么了? How to fix that?如何解决?

This is not a bug, but a bug fix in Python 3.7 from the commit fbb490fd2f38bd817d99c20c05121ad0168a38ee .这不是错误,而是来自提交fbb490fd2f38bd817d99c20c05121ad0168a38ee 的Python 3.7 中的错误修复。

In regex, a non-zero-width match moves the pointer position to the end of the match, so that the next assertion, zero-width or not, can continue to match from the position following the match.在正则表达式中,非零宽度匹配将指针位置移动到匹配的末尾,以便下一个断言,无论是否为零宽度,都可以从匹配之后的位置继续匹配。 So in your example, after .* greedily matches and consumes the entire string, the fact that the pointer is then moved to the end of the string still actually leaves "room" for a zero-width match at that position, as can be evident from the following code, which behaves the same in Python 2.7, 3.6 and 3.7:因此,在您的示例中,在.*贪婪地匹配并消耗整个字符串之后,随后将指针移动到字符串末尾的事实实际上仍然为该位置的零宽度匹配留下了“空间”,这很明显来自以下代码,其在 Python 2.7、3.6 和 3.7 中的行为相同:

>>> re.findall(".*", 'sample text')
['sample text', '']

So the bug fix, which is about replacement of a zero-width match right after a non-zero-width match, now correctly replaces both matches with the replacement text.因此,关于在非零宽度匹配之后立即替换零宽度匹配的错误修复现在正确地用替换文本替换了两个匹配。

This is a common regex issue, it affects a lot of regex flavors, see related这是一个常见的正则表达式问题,它影响了很多正则表达式风格,请参阅相关内容

There are several ways to fix the issue:有几种方法可以解决此问题:

  • Add anchors on both sides of .* : re.sub("^.*$", "(replacement)", "sample text").*两边添加锚点: re.sub("^.*$", "(replacement)", "sample text")
  • Since you want to only match a line once, add the count=1 argument: print( re.sub(".*", "(replacement)", "sample text", count=1) )由于您只想匹配一行一次,因此添加count=1参数: print( re.sub(".*", "(replacement)", "sample text", count=1) )
  • In case you want to replace any non-empty line, replace * with + : print( re.sub(".+", "(replacement)", "sample text") )如果您想替换任何非空行,请将*替换为+print( re.sub(".+", "(replacement)", "sample text") )

See the Python demo :请参阅Python 演示

import re
# Adding anchors:
print( re.sub("^.*$", "(replacement)", "sample text") ) # => (replacement)
# Using the count=1 argument
print( re.sub(".*", "(replacement)", "sample text", count=1) ) # => (replacement)
# If you want to replace non-empty lines:
print( re.sub(".+", "(replacement)", "sample text") ) # => (replacement)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM