简体   繁体   English

将重新编译添加到列表

[英]Adding re.compile to list

When I compile a regex and assign it to a variable or add it to a list, I suspiciously get different behaviour on Python 2.x and 3.x. 当我编译一个正则表达式并将其分配给变量或将其添加到列表中时,我怀疑在Python 2.x和3.x上会出现不同的行为。

import re
z = re.compile('a')
print(z)

Above snippet prints on 2.x 上面的片段在2.x上打印

<_sre.SRE_Pattern object at 0x7ff839e57030>

and on 3.x 并在3.x上

re.compile('a')

The first one looks like the regex is compiled and ready to go whenever I need it (which is what I want) but the second one still says re.compile . 第一个看起来像正则表达式已编译并且可以随时使用(这是我想要的),但是第二个仍然显示re.compile

Does that mean the regex is compiled on the fly when I need it and even worse recompiled every time I reference z and do something like z.match('a') ? 这是否意味着我需要时就可以对正则表达式进行即时编译,甚至更糟糕的是,每次我引用z并执行z.match('a')类的东西时都会对其进行重新编译? Or is the described Python 3 behaviour just cosmetic and it also maintains a compiled copy under the hood? 还是所描述的Python 3行为仅仅是装饰性的,并且还在后台维护了已编译的副本?

My point is, I (statically) compile my regexes at the beginning of the source file, so can save some time where I repetitively reference them in loops but if this is not happening, then that isn't good. 我的观点是,我(静态地)在源文件的开头编译了我的正则表达式,因此可以节省一些时间,因为我在循环中重复引用它们,但是如果这种情况没有发生,那就不好了。

All this means is that the __repr__ of _sre.SRE_Pattern has been changed, from the (not terribly helpful) default "<classname object at address>" to something more useful. 这意味着__repr___sre.SRE_Pattern已更改,从(不是很有用)默认的"<classname object at address>"变为更有用的内容。 Per the data model documentation (emphasis mine): 根据数据模型文档 (重点是我的):

If at all possible, [the __repr__ string representation of an object] should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). 如果可能,[对象的__repr__字符串表示形式] 应该看起来像一个有效的Python表达式,该表达式可用于重新创建具有相同值的对象 (给定适当的环境)。 If this is not possible, a string of the form <...some useful description...> should be returned. 如果这不可能,则应返回格式为<...some useful description...>的字符串。

Compare 2.x: 比较2.x:

>>> import re
>>> a = re.compile('a')
>>> a
<_sre.SRE_Pattern object at 0x02654440>
>>> type(a)
<type '_sre.SRE_Pattern'>
>>> repr(a)
'<_sre.SRE_Pattern object at 0x02654440>'

And 3.x: 和3.x:

>>> import re
>>> a = re.compile('a')
>>> a
re.compile('a')
>>> type(a)
<class '_sre.SRE_Pattern'>
>>> repr(a)
"re.compile('a')"

There is no difference in behaviour - the regular expression is still only compiled once (that's the whole point). 行为没有区别-正则表达式仍然只编译一次(这就是重点)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM