简体   繁体   English

re.sub()文档误解

[英]re.sub() documentation misunderstanding

I've just started learning regular expressions and the documentation for re.sub() states: 我刚刚开始学习正则表达式和re.sub()状态的文档:

Changed in version 3.5 : Unmatched groups are replaced with an empty string. 在版本3.5中进行了更改 :不匹配的组将替换为空字符串。

Deprecated since version 3.5, will be removed in version 3.6 : Unknown escapes consist of '\\' and ASCII letter now raise a deprecation warning and will be forbidden in Python 3.6. 从3.5版开始不推荐使用,将在3.6版中删除 :由'\\'和ASCII字母组成的未知转义字符现在会引发弃用警告,并且在Python 3.6中将被禁止。

Is re.sub() deprecated? re.sub()是否已弃用? What should I use then? 那我该怎么用呢?

You misunderstand the documentation. 您误解了文档。 The re.sub() function is not deprecated . 不建议使用 re.sub()函数。 The deprecation warning concerns specific syntax . 弃用警告涉及特定语法

Earlier in the re.sub() documentation you'll find this: re.sub()文档的前面,您会找到以下内容:

Unknown escapes such as \\& are left alone. 诸如\\&类的未知转义字符将单独保留。

If you used and unknown escape with an ASCII letter the escape will no longer be ignored, you'll get a warning instead. 如果您使用了未知且带有ASCII字母的转义符,则该转义符将不再被忽略,您会收到警告。 This applies both to re.sub() replacement patterns and to the regular expression patterns. 这适用于re.sub()替换模式正则表达式模式。 The same warning is placed in the section on regex pattern syntax. 关于正则表达式模式语法的部分中也放置了相同的警告。

The Changed in version 3.5 line also concerns how re.sub() works. 版本3.5中更改行还涉及re.sub()工作方式。 Rather than raise an exception when there is no matching group for a \\number backreference, an empty string is inserted at that location. 没有针对\\number反向引用的匹配组时,不会引发异常,而是在该位置插入一个空字符串。

The two entries are not related, and re.sub will not be deprecated. 这两个条目不相关,并且re.sub 不会被弃用。

In Python version earlier than 3.5 re.sub failed if a backreference was used to a capturing group that did not participate in the match. 在低于3.5的Python版本中,如果对未参与匹配的捕获组使用了反向引用,则re.sub失败。 See Empty string instead of unmatched group error SO question. 请参阅空字符串而不是不匹配的组错误 SO问题。

An example where the failure occurred: 发生故障的示例

import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something\1something', old)
print(new) # => fail as there is no "group" in between "regex" and "regex" in "regexregex"
#    and Group 1 was not initialized with an empty string, i.e. remains null

As for the second one, it only says that there will be a warning (and later forbidden) if you use an unknown for a regex engine literal backslash followed with an ASCII character. 至于第二个,它只说如果对正则表达式引擎原义反斜杠使用ASCII字符使用未知数,则会出现警告(以后禁止)。 The backslash was just ignored in them before, in Python 2.x through 3.5, print(re.sub(r'\\j', '', 'joy')) prints oy . 在它们之前,反斜杠只是被忽略了,在Python 2.x到3.5中, print(re.sub(r'\\j', '', 'joy')) 打印oy So, these will be forbidden in Python 3.6. 因此,这些将在Python 3.6中被禁止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM