简体   繁体   English

我应该像C ++一样优化我的python代码吗?有关系吗?

[英]Should I optimise my python code like C++? Does it matter?

I had an argument with a colleague about writing python efficiently. 我和一位同事讨论过有效编写python的问题。 He claimed that though you are programming python you still have to optimise the little bits of your software as much as possible, as if you are writing an efficient algorithm in C++. 他声称虽然你正在编写python,你仍然需要尽可能地优化软件的一点点,就像你在C ++中编写一个有效的算法一样。

Things like: 像:

  • In an if statement with an or always put the condition most likely to fail first, so the second will not be checked. 在带有or总是将条件最有可能首先失败的if语句中,所以不会检查第二个。
  • Use the most efficient functions for manipulating strings in common use. 使用最有效的函数来操作常用的字符串。 Not code that grinds strings, but simple things like doing joins and splits, and finding substrings. 不是研磨字符串的代码,而是简单的事情,比如进行连接和分割,以及查找子字符串。
  • Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates. 尽可能少地调用函数,即使它以牺牲可读性为代价,因为它会产生开销。

I say, that in most cases it doesn't matter. 我说,在大多数情况下,这并不重要。 I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. 我还应该说,代码的背景不是超高效的NOC或导弹制导系统。 We're mostly writing tests in python. 我们主要是在python中编写测试。

What's your view of the matter? 你对此事有何看法?

My answer to that would be : 我的答案是:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. 我们应该忘记小的效率,大约97%的时间说:过早的优化是所有邪恶的根源。

(Quoting Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268) (引用Knuth,Donald。结构化编程,参见陈述,ACM期刊计算调查,第6卷,第4期,1974年12月。第268页)


If your application is doing anything like a query to the database, that one query will take more time than anything you can gain with those kind of small optimizations, anyway... 如果你的应用程序正在对数据库进行查询,那么一个查询将花费更多时间,而不是那些小优化所能获得的任何东西,无论如何......

And if running after performances like that, why not code in assembly language, afterall ? 如果在这样的表演之后运行,为什么不用汇编语言编码,毕竟呢? Because Python is easier/faster to write and maintain ? 因为Python更容易/更快地编写和维护? Well, if so, you are right :-) 好吧,如果是这样,你是对的:-)

The most important thing is that your code is easy to maintain ; 最重要的是您的代码易于维护; not a couple micro-seconds of CPU-time ! 不是几秒微秒的CPU时间!
Well, maybe except if you have thousands of servers -- but is it your case ? 好吧,也许除非您有数千台服务器 - 但这是您的情况吗?

The answer is really simple : 答案很简单:

  • Follow Python best practices, not C++ best practices. 遵循Python最佳实践,而不是C ++最佳实践。
  • Readability in Python is more important that speed. Python中的可读性比速度更重要。
  • If performance becomes an issue, measure, then start optimizing. 如果性能成为问题,请进行测量,然后开始优化。

This sort of premature micro-optimisation is usually a waste of time in my experience, even in C and C++. 这种过早的微优化通常在我的经验中浪费时间,即使在C和C ++中也是如此。 Write readable code first. 首先编写可读代码。 If it's running too slowly, run it through a profiler, and if necessary, fix the hot-spots. 如果运行速度太慢,请通过分析器运行,如有必要,请修复热点。

Fundamentally, you need to think about return on investment. 从根本上说,你需要考虑投资回报率。 Is it worth the extra effort in reading and maintaining "optimised" code for the couple of microseconds it saves you? 是否值得花费额外的精力来阅读和维护“优化​​”代码,节省几微秒? In most cases it isn't. 在大多数情况下,它不是。

(Also, compilers and runtimes are getting cleverer. Some micro-optimisations may become micro-pessimisations over time.) (此外,编译器和运行时变得越来越聪明。随着时间的推移,一些微观优化可能会变成微观悲观。)

I agree with others: readable code first ("Performance is not a problem until performance is a problem."). 我同意其他人的观点:首先是可读代码(“在性能出现问题之前,性能不是问题。”)。

I only want to add that when you absolutely need to write some unreadable and/or non-intuitive code, you can generally isolate it in few specific methods, for which you can write detailed comments, and keep the rest of your code highly readable. 我只想补充一点,当你绝对需要编写一些不可读和/或非直观的代码时,你通常可以用几个特定的​​方法来隔离它,为此可以编写详细的注释,并保持代码的其余部分具有高可读性。 If you do so, you'll end up having easy to maintain code, and you'll only have to go through the unreadable parts when you really need to. 如果你这样做,你最终将拥有易于维护的代码,并且你只需要在需要时通过不可读的部分。

I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. 我还应该说,代码的背景不是超高效的NOC或导弹制导系统。 We're mostly writing tests in python. 我们主要是在python中编写测试。

Given this, I'd say that you should take your colleague's advice about writing efficient Python but ignore anything he says that goes against prioritizing readability and maintainability of the code, which will probably be more important than the speed at which it'll execute. 考虑到这一点,我会说你应该听取同事关于编写高效Python的建议,但忽略他所说的反对优先考虑代码的可读性和可维护性的任何东西,这可能比它执行的速度更重要。

In an if statement with an or always put the condition most likely to fail first, so the second will not be checked. 在带有或者总是将条件最有可能首先失败的if语句中,所以不会检查第二个。

This is generally a good advice, and also depends on the logic of your program. 这通常是一个很好的建议,也取决于你的程序的逻辑。 If it makes sense that the second statement is not evaluated if the first returns false, then do so. 如果第一个返回false,则第二个语句没有被评估是有意义的,那么这样做。 Doing the opposite could be a bug otherwise. 反之亦然可能是一个错误。

Use the most efficient functions for manipulating strings in common use. 使用最有效的函数来操作常用的字符串。 Not code that grinds strings, but simple things like doing joins and splits, and finding substrings. 不是研磨字符串的代码,而是简单的事情,比如进行连接和分割,以及查找子字符串。

I don't really get this point. 我真的不明白这一点。 Of course you should use the library provided functions, because they are probably implemented in C, and a pure python implementation is most likely to be slower. 当然你应该使用库提供的函数,因为它们可能用C实现,而纯python实现最有可能更慢。 In any case, no need to reinvent the wheel. 无论如何,不​​需要重新发明轮子。

Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates. 尽可能少地调用函数,即使它以牺牲可读性为代价,因为它会产生开销。

$ cat withcall.py
def square(a):
        return a*a

for i in xrange(1,100000):
        i_square = square(i)

$ cat withoutcall.py
for i in xrange(1,100000):
        i_square = i*i

$ time python2.3 withcall.py
real    0m5.769s
user    0m4.304s
sys     0m0.215s
$ time python2.3 withcall.py
real    0m5.884s
user    0m4.315s
sys     0m0.206s

$ time python2.3 withoutcall.py
real    0m5.806s
user    0m4.172s
sys     0m0.209s
$ time python2.3 withoutcall.py
real    0m5.613s
user    0m4.171s
sys     0m0.216s

I mean... come on... please. 我的意思是......来吧......拜托。

I think there are several related 'urban legends' here. 我认为这里有几个相关的“城市传说”。

  • False Putting the more often-checked condition first in a conditional and similar optimizations save enough time for a typical program that it is worthy for a typical programmer. False将经常检查的条件放在条件和类似的优化中,为典型的程序节省了足够的时间,这对于典型的程序员来说是值得的。

  • True Some, but not many, people are using such styles in Python in the incorrect belief outlined above. 有些人,但并不是很多,人们在Python中使用这种风格的错误信念。

  • True Many people use such style in Python when they think that it improves readability of a Python program. 真的很多人在认为它提高了Python程序的可读性时会在Python中使用这种样式。

About readability: I think it's indeed useful when you give the most useful conditional first, since this is what people notice first anyway. 关于可读性:我认为当你首先给出最有用的条件时它确实很有用,因为这是人们首先注意到的。 You should also use ''.join() if you mean concatenation of strings since it's the most direct way to do it (the s += x operation could mean something different). 你也应该使用''.join()如果你的意思是连接字符串,因为它是最直接的方法( s += x操作可能意味着不同的东西)。

"Call as less functions as possible" decreases readability and goes against Pythonic principle of code reuse. “尽可能少地调用函数”会降低可读性,并违背Pythonic代码重用原则。 And so it's not a style people use in Python. 所以这不是人们在Python中使用的风格。

Before introducing performance optimizations at the expense of readability, look into modules like psyco that will do some JIT-ish compiling of distinct functions, often with striking results, with no impairment of readability. 在以可读性为代价引入性能优化之前,请研究像psyco这样的模块,这些模块将执行一些JIT-ish编译不同的功能,通常具有惊人的结果,不会损害可读性。

Then if you really want to embark on the optimization path, you must first learn to measure and profile. 然后,如果您真的想要开始优化路径,您必须首先学习测量和分析。 Optimization MUST BE QUANTITATIVE - do not go with your gut. 优化必须是定量的 - 不要与你的直觉相关。 The hotspot profiler will show you the functions where your program is burning up the most time. 热点分析器将向您显示程序烧毁时间最多的功能。

If optimization turns up a function like this is being frequently called: 如果优化出现,经常会调用这样的函数:

def get_order_qty(ordernumber):
    # look up order in database and return quantity

If there is any repetition of ordernumbers, then memoization would be a good optimization technique to learn, and it is easily packaged in an @memoize decorator so that there is little impact to program readability. 如果有任何重复的ordernumbers,那么memoization将是一个很好的学习优化技术,它很容易打包在@memoize装饰器中,这样对程序的可读性几乎没有影响。 The effect of memoizing is that values returned for a given set of input arguments are cached, so that the expensive function can be called only once, with subseqent calls resolved against the cache. memoizing的效果是为一组给定的输入参数返回的值被缓存,因此昂贵的函数只能被调用一次,而后续的调用将针对缓存进行解析。

Lastly, consider lifting invariants out of loops. 最后,考虑从循环中提升不变量。 For large multi-dimensional structures, this can save a lot of time - in fact in this case, I would argue that this optimization improves readability, as it often serves to make clear that some expression can be computed at a high-level dimension in the nested logic. 对于大型多维结构,这可以节省大量时间 - 实际上在这种情况下,我认为这种优化提高了可读性,因为它通常用于表明某些表达式可以在高级维度中计算嵌套逻辑。

(BTW, is this really what you meant? •In an if statement with an or always put the condition most likely to fail first, so the second will not be checked. (顺便说一句,这真的是你的意思吗?•在带有或者总是把条件最有可能首先失败的if语句中,所以第二个不会被检查。

I should think this might be the case for "and", but an "or" will short-circuit if the first value is True, saving the evaluation of the second term of the conditional. 我认为这可能是“和”的情况,但如果第一个值为True,则“或”将短路,从而保存条件的第二项的评估。 So I would change this optimization "rule" to: 所以我会将此优化“规则”更改为:

  • If testing "A and B", put A first if it is more likely to evaluate to 如果测试“A和B”,如果更有可能评估,则将A放在第一位
    False. 假。
  • If testing "A or B", put A first if it is more likely to evaluate to True. 如果测试“A或B”,如果更可能评估为True,则将A放在第一位。

But often, the sequence of conditions is driven by the tests themselves: 但通常,条件的顺序是由测试本身驱动的:

if obj is not None and hasattr(obj,"name") and obj.name.startswith("X"):

You can't reorder these for optimization - they have to be in this order (or just let the exceptions fly and catch them later: 您无法对这些进行重新排序以进行优化 - 它们必须按此顺序排列(或者只是让异常飞行并在以后捕获它们:

if obj.name.startswith("X"):

Sure follow Python best-practices (and in fact I agree with the first two recommendations), but maintainability and efficiency are not opposites, they are mostly togethers (if that's a word). 当然遵循Python最佳实践(事实上我同意前两个建议),但可维护性和效率不是对立的,它们主要是togethers(如果这是一个词)。

Statements like "always write your IF statements a certain way for performance" are a-priori, ie not based on knowledge of what your program spends time on, and are therefore guesses. 诸如“总是以某种方式为性能编写您的IF语句”之类的陈述是先验的,即不是基于您的程序花费时间的知识,因此是猜测。 The first (or second, or third, whatever) rule of performance tuning is don't guess . 第一个(或第二个,或第三个,无论如何)性能调整规则是不要猜测的

If after you measure, profile, or in my case do this , you actually know that you can save much time by re-ordering tests, by all means, do. 如果你在测量,分析之后,或者在我的情况下这样做 ,你实际上知道你可以通过重新排序测试来节省大量时间。 My money says that's at the 1% level or less. 我的钱说的是1%或更低。

My visceral reaction is this: 我的内心反应如下:

I've worked with guys like your colleague and in general I wouldn't take advice from them. 我和你的同事一起工作,总的来说,我不会接受他们的建议。

Ask him if he's ever even used a profiler. 问他是否曾经使用过探查器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM