[英]Use Python's string.replace vs re.sub
For Python 2.5, 2.6, should I be using string.replace
or re.sub
for basic text replacements? 对于Python
string.replace
,我应该使用string.replace
还是re.sub
进行基本的文本替换?
In PHP, this was explicitly stated but I can't find a similar note for Python. 在PHP中,这是明确说明的,但我找不到类似的Python注释。
As long as you can make do with str.replace()
, you should use it. 只要您可以使用
str.replace()
,就应该使用它。 It avoids all the pitfalls of regular expressions (like escaping), and is generally faster. 它避免了正则表达式的所有陷阱(如转义),并且通常更快。
str.replace()
should be used whenever it's possible to. 只要有可能,就应该使用
str.replace()
。 It's more explicit, simpler, and faster. 它更明确,更简单,更快捷。
In [1]: import re
In [2]: text = """For python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements.
In PHP, this was explicitly stated but I can't find a similar note for python.
"""
In [3]: timeit text.replace('e', 'X')
1000000 loops, best of 3: 735 ns per loop
In [4]: timeit re.sub('e', 'X', text)
100000 loops, best of 3: 5.52 us per loop
String manipulation is usually preferable to regex when you can figure out how to adapt it. 当你可以弄清楚如何调整它时,字符串操作通常比正则表达式更好。 Regex is incredibly powerful, but it's usually slower, and usually harder to write, debug, and maintain.
正则表达式非常强大,但它通常较慢, 通常更难编写,调试和维护。
That being said, notice the amount of "usually" in the above paragraph! 话虽如此,请注意上段中“通常”的数量! It's possible (and I've seen it done) to write a zillion lines of string manipulation for something you could've done with a 20-character regex.
有可能(而且我已经看到它已经完成)为一些20字符正则表达式完成的事情编写了数十亿行字符串操作。 It's also possible to waste valuable time using "efficient" string functions on tasks a good regex engine could do almost as fast.
使用“高效”字符串函数浪费宝贵的时间就可以完成一个好的正则表达式引擎几乎同样快的任务。 Then there's maintainability: Regex can be horribly complex, but sometimes a regex will be simpler and easier to read than a giant block of procedural code.
然后就是可维护性:正则表达式可能非常复杂,但有时候正则表达式会比一大块程序代码更简单,更容易阅读。
Regex is fantastic for its intended purpose: searching for highly-variable needles in highly-variable haystacks. 正则表达式的目的非常出色:在高度变化的草垛中寻找高度可变的针头。 Think of it as a precision torque wrench: It's the perfect tool for a specific set of jobs, but it makes a lousy hammer.
可以把它想象成一个精确的扭矩扳手:它是完成一系列特定作业的完美工具,但它却是一个糟糕的锤子。
- Is the pattern you're looking for highly static?
您正在寻找高度静态的模式吗? For example, do you want to split a string on every comma, pipe, or tab?
例如,您要在每个逗号,管道或制表符上拆分字符串吗?
- Is resource efficiency more important than developer time?
资源效率比开发者时间更重要吗? What are your priorities?
你的首要任务是什么? Remember: Hardware is cheap, programmers are expensive .
请记住: 硬件很便宜,程序员很贵 。
- Are you working with HTML, XML, or other context-free grammars?
您使用的是HTML,XML还是其他无上下文的语法? Don't forget that regex has limitations.
不要忘记正则表达式有局限性。
- And my #1 rule of thumb: If you work on the problem for 5 minutes, can you rough out an idea for a non-regex approach?
而我的第一条经验法则: 如果你在5分钟内完成这个问题,那么你是否可以粗略地提出一个非正则表达式的想法?
If the answer to any of these questions is "yes", you probably want string manipulation. 如果任何这些问题的答案都是“是”,那么您可能需要字符串操作。 Otherwise, consider regex.
否则,请考虑正则表达式。
另一件需要考虑的事情是,如果您正在进行相当复杂的替换, str.translate()可能就是您正在寻找的东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.