简体   繁体   English

加速 Python

[英]Speeding Up Python

This is really two questions, but they are so similar, and to keep it simple, I figured I'd just roll them together:这确实是两个问题,但它们非常相似,为了简单起见,我想我会把它们放在一起:

  • Firstly : Given an established python project, what are some decent ways to speed it up beyond just plain in-code optimization?首先:给定一个已建立的 python 项目,除了简单的代码优化之外,还有哪些不错的方法可以加快它的速度?

  • Secondly : When writing a program from scratch in python, what are some good ways to greatly improve performance?其次:在python中从头开始编写程序时,有哪些可以大大提高性能的好方法?

For the first question, imagine you are handed a decently written project and you need to improve performance, but you can't seem to get much of a gain through refactoring/optimization.对于第一个问题,假设您收到了一个编写得体的项目,您需要提高性能,但您似乎无法通过重构/优化获得太多收益。 What would you do to speed it up in this case short of rewriting it in something like C?在这种情况下,除了用 C 之类的东西重写它之外,你会怎么做来加速它?

Regarding "Secondly: When writing a program from scratch in python, what are some good ways to greatly improve performance?"关于“第二:用python从头开始写程序时,有什么好的方法可以大大提高性能?”

Remember the Jackson rules of optimization:记住杰克逊的优化规则:

  • Rule 1: Don't do it.规则1:不要这样做。
  • Rule 2 (for experts only): Don't do it yet.规则 2(仅适用于专家):先不要这样做。

And the Knuth rule:和克努斯规则:

  • "Premature optimization is the root of all evil." “过早优化是万恶之源。”

The more useful rules are in the General Rules for Optimization .更有用的规则在优化通用规则中

  1. Don't optimize as you go.不要边走边优化。 First get it right.先弄对。 Then get it fast.那就快点拿吧。 Optimizing a wrong program is still wrong.优化错误的程序仍然是错误的。

  2. Remember the 80/20 rule.记住 80/20 规则。

  3. Always run "before" and "after" benchmarks.始终运行“之前”和“之后”基准测试。 Otherwise, you won't know if you've found the 80%.否则,您将不知道是否找到了 80%。

  4. Use the right algorithms and data structures.使用正确的算法和数据结构。 This rule should be first.这个规则应该是第一位的。 Nothing matters as much as algorithm and data structure.没有什么比算法和数据结构更重要。

Bottom Line底线

You can't prevent or avoid the "optimize this program" effort.您无法阻止或避免“优化此程序”的努力。 It's part of the job.这是工作的一部分。 You have to plan for it and do it carefully, just like the design, code and test activities.你必须为它计划并仔细地做,就像设计、编码和测试活动一样。

Rather than just punting to C, I'd suggest:我建议,而不是仅仅使用 C,而是:

Make your code count.让你的代码有价值。 Do more with fewer executions of lines:以更少的执行次数做更多的事情:

  • Change the algorithm to a faster one.将算法更改为更快的算法。 It doesn't need to be fancy to be faster in many cases.在许多情况下,速度更快并不需要花哨。
  • Use python primitives that happens to be written in C. Some things will force an interpreter dispatch where some wont.使用碰巧用 C 编写的 python 原语。有些事情会强制解释器调度,而有些则不会。 The latter is preferable后者更可取
  • Beware of code that first constructs a big data structure followed by its consumation.谨防首先构建大数据结构然后使用它的代码。 Think the difference between range and xrange.想想 range 和 xrange 之间的区别。 In general it is often worth thinking about memory usage of the program.一般来说,通常值得考虑程序的内存使用情况。 Using generators can sometimes bring O(n) memory use down to O(1).使用生成器有时可以将 O(n) 内存使用降低到 O(1)。
  • Python is generally non-optimizing. Python 通常是非优化的。 Hoist invariant code out of loops, eliminate common subexpressions where possible in tight loops.从循环中提升不变代码,在紧密循环中尽可能消除常见的子表达式。
  • If something is expensive, then precompute or memoize it.如果某样东西很贵,那么预先计算或记住它。 Regular expressions can be compiled for instance.例如,可以编译正则表达式。
  • Need to crunch numbers?需要处理数字吗? You might want to check numpy out.您可能想查看numpy
  • Many python programs are slow because they are bound by disk I/O or database access.许多 python 程序运行缓慢是因为它们受磁盘 I/O 或数据库访问的限制。 Make sure you have something worthwhile to do while you wait on the data to arrive rather than just blocking.确保在等待数据到达时您有一些值得做的事情,而不仅仅是阻塞。 A weapon could be something like the Twisted framework.武器可能类似于Twisted框架。
  • Note that many crucial data-processing libraries have C-versions, be it XML, JSON or whatnot.请注意,许多关键的数据处理库都有 C 版本,无论是 XML、JSON 还是诸如此类。 They are often considerably faster than the Python interpreter.它们通常比 Python 解释器快得多。

If all of the above fails for profiled and measured code, then begin thinking about the C-rewrite path.如果上述所有分析和测量代码都失败了,那么开始考虑 C 重写路径。

The usual suspects -- profile it, find the most expensive line, figure out what it's doing, fix it.通常的嫌疑人——分析它,找到最昂贵的线路,弄清楚它在做什么,修复它。 If you haven't done much profiling before, there could be some big fat quadratic loops or string duplication hiding behind otherwise innocuous-looking expressions.如果您以前没有做过很多分析,那么可能会有一些大的二次循环或字符串重复隐藏在其他看起来无害的表达式后面。

In Python, two of the most common causes I've found for non-obvious slowdown are string concatenation and generators.在 Python 中,我发现导致非明显减速的两个最常见原因是字符串连接和生成器。 Since Python's strings are immutable, doing something like this:由于 Python 的字符串是不可变的,因此可以执行以下操作:

result = u""
for item in my_list:
    result += unicode (item)

will copy the entire string twice per iteration.每次迭代将复制整个字符串两次。 This has been well-covered, and the solution is to use "".join :这已经被很好地覆盖了,解决方案是使用"".join

result = "".join (unicode (item) for item in my_list)

Generators are another culprit.发电机是另一个罪魁祸首。 They're very easy to use and can simplify some tasks enormously, but a poorly-applied generator will be much slower than simply appending items to a list and returning the list.它们非常易于使用并且可以极大地简化某些任务,但是应用不佳的生成器将比简单地将项目附加到列表并返回列表要慢得多。

Finally, don't be afraid to rewrite bits in C!最后,不要害怕用 C 重写位! Python, as a dynamic high-level language, is simply not capable of matching C's speed. Python 作为一种动态的高级语言,根本无法与 C 的速度相提并论。 If there's one function that you can't optimize any more in Python, consider extracting it to an extension module.如果您无法在 Python 中优化某个函数,请考虑将其提取到扩展模块中。

My favorite technique for this is to maintain both Python and C versions of a module.我最喜欢的技术是同时维护模块的 Python 和 C 版本。 The Python version is written to be as clear and obvious as possible -- any bugs should be easy to diagnose and fix. Python 版本的编写尽可能清晰明了——任何错误都应该易于诊断和修复。 Write your tests against this module.针对此模块编写测试。 Then write the C version, and test it.然后写C版本,测试一下。 Its behavior should in all cases equal that of the Python implementation -- if they differ, it should be very easy to figure out which is wrong and correct the problem.它的行为在所有情况下都应该与 Python 实现的行为相同——如果它们不同,应该很容易找出哪个是错误的并纠正问题。

First thing that comes to mind: psyco .首先想到的是: psyco It runs only on x86, for the time being.它暂时只在 x86 上运行。

Then, constant binding .然后,常量绑定 That is, make all global references (and global.attr , global.attr.attr …) be local names inside of functions and methods.也就是说,使所有全局引用(以及global.attrglobal.attr.attr ...)成为函数和方法内部的局部名称。 This isn't always successful, but in general it works.这并不总是成功的,但总的来说它是有效的。 It can be done by hand, but obviously is tedious.它可以手工完成,但显然很乏味。

You said apart from in-code optimization, so I won't delve into this, but keep your mind open for typical mistakes ( for i in range(10000000) comes to mind) that people do.你说除了代码内优化之外,所以我不会深入研究这一点,但请对人们所做的典型错误( for i in range(10000000)想到)保持开放的心态。

Cython and pyrex can be used to generate c code using a python-like syntax. Cython 和 pyrex 可用于使用类似 python 的语法生成 c 代码。 Psyco is also fantastic for appropriate projects (sometimes you'll not notice much speed boost, sometimes it'll be as much as 50x as fast). Psyco 对于适当的项目也非常棒(有时你不会注意到速度有多大提升,有时它会快 50 倍)。 I still reckon the best way is to profile your code (cProfile, etc.) and then just code the bottlenecks as c functions for python.我仍然认为最好的方法是分析您的代码(cProfile 等),然后将瓶颈编码为 python 的 c 函数。

I'm surprised no one mentioned ShedSkin: http://code.google.com/p/shedskin/ , it automagically converts your python program to C++ and in some benchmarks yields better improvements than psyco in speed.我很惊讶没有人提到 ShedSkin: http : //code.google.com/p/shedskin/ ,它自动将您的 Python 程序转换为 C++,并且在某些基准测试中比 psyco 在速度方面产生了更好的改进。

Plus anecdotal stories on the simplicity: http://pyinsci.blogspot.com/2006/12/trying-out-latest-release-of-shedskin.html加上关于简单性的轶事: http : //pyinsci.blogspot.com/2006/12/trying-out-latest-release-of-shedskin.html

There are limitations though, please see: http://tinyurl.com/shedskin-limitations但是有限制,请参阅: http : //tinyurl.com/shedskin-limitations

I hope you've read: http://wiki.python.org/moin/PythonSpeed/PerformanceTips我希望你已经阅读: http : //wiki.python.org/moin/PythonSpeed/PerformanceTips

Resuming what's already there are usualy 3 principles:恢复已经存在的东西通常有 3 个原则:

  • write code that gets transformed in better bytecode, like, use locals, avoid unnecessary lookups/calls, use idiomatic constructs (if there's natural syntax for what you want, use it - usually faster. eg: don't do: "for key in some_dict.keys()", do "for key in some_dict")编写在更好的字节码中转换的代码,例如,使用本地变量,避免不必要的查找/调用,使用惯用结构(如果有你想要的自然语法,使用它 - 通常更快。例如:不要这样做:“输入some_dict.keys()", 做 "for key in some_dict")
  • whatever is written in C is considerably faster, abuse whatever C functions/modules you have available用 C 编写的任何东西都快得多,滥用您可用的任何 C 函数/模块
  • when in doubt, import timeit, profile如有疑问,请导入 timeit、profile

This won't necessarily speed up any of your code, but is critical knowledge when programming in Python if you want to avoid slowing your code down.这不一定会加速您的任何代码,但如果您想避免减慢代码速度,那么在使用 Python 编程时是关键知识。 The "Global Interpreter Lock" (GIL), has the potential to drastically reduce the speed of your multi-threaded program if its behavior is not understood (yes, this bit me ... I had a nice 4 processor machine that wouldn't use more than 1.2 processors at a time).如果不理解多线程程序的行为,“全局解释器锁”(GIL)有可能大幅降低多线程程序的速度(是的,这有点像我......我有一台不错的 4 处理器机器,它不会一次使用超过 1.2 个处理器)。 There's an introductory article with some links to get you started at SmoothSpan .有一篇介绍性文章,其中包含一些链接,可帮助您开始使用SmoothSpan

Run your app through the Python profiler.通过 Python 分析器运行您的应用程序。 Find a serious bottleneck.找到一个严重的瓶颈。 Rewrite that bottleneck in C. Repeat.用 C 重写那个瓶颈。重复。

People have given some good advice, but you have to be aware that when high performance is needed, the python model is: punt to c.人们已经给出了一些很好的建议,但你必须意识到,当需要高性能时,python 模型是:punt to c。 Efforts like psyco may in the future help a bit, but python just isn't a fast language, and it isn't designed to be.像 psyco 这样的努力在未来可能会有所帮助,但 python 并不是一种快速的语言,它的设计也不是这样。 Very few languages have the ability to do the dynamic stuff really well and still generate very fast code;很少有语言能够真正出色地处理动态内容并且仍然生成非常快的代码; at least for the forseeable future (and some of the design works against fast compilation) that will be the case.至少在可预见的未来(以及一些不利于快速编译的设计)情况会如此。

So, if you really find yourself in this bind, your best bet will be to isolate the parts of your system that are unacceptable slow in (good) python, and design around the idea that you'll rewrite those bits in C. Sorry.所以,如果你真的发现自己陷入这种困境,你最好的办法是隔离系统中那些在(好的)python 中慢得不能接受的部分,并围绕你将用 C 重写这些位的想法进行设计。抱歉。 Good design can help make this less painful.好的设计可以帮助减少这种痛苦。 Prototype it in python first though, then you've easily got a sanity check on your c, as well.不过,首先在 python 中对其进行原型设计,然后您也可以轻松地对您的 c 进行完整性检查。

This works well enough for things like numpy, after all.毕竟,这对于像 numpy 这样的东西来说已经足够了。 I can't emphasize enough how much good design will help you though.不过,好的设计对你有多大帮助,我再怎么强调也不为过。 If you just iteratively poke at your python bits and replace the slowest ones with C, you may end up with a big mess.如果你只是反复地检查你的 python 位并用 C 替换最慢的,你最终可能会弄得一团糟。 Think about exactly where the C bits are needed, and how they can be minimized and encapsulated sensibly.仔细考虑需要 C 位的确切位置,以及如何合理地最小化和封装它们。

It's often possible to achieve near-C speeds (close enough for any project using Python in the first place!) by replacing explicit algorithms written out longhand in Python with an implicit algorithm using a built-in Python call.通常可以通过使用内置 Python 调用将 Python 手写的显式算法替换为隐式算法来实现接近 C 的速度(对于任何使用 Python 的项目来说都足够接近!)。 This works because most Python built-ins are written in C anyway.这是有效的,因为大多数 Python 内置函数无论如何都是用 C 编写的。 Well, in CPython of course ;-) https://www.python.org/doc/essays/list2str/好吧,当然是在 CPython 中;-) https://www.python.org/doc/essays/list2str/

Just a note on using psyco: In some cases it can actually produce slower run-times.关于使用 psyco 的注意事项:在某些情况下,它实际上会产生更慢的运行时间。 Especially when trying to use psyco with code that was written in C. I can't remember the the article I read this, but the map() and reduce() functions were mentioned specifically.特别是在尝试将 psyco 与用 C 编写的代码一起使用时。我不记得我读过的文章,但特别提到了map()reduce()函数。 Luckily you can tell psyco not to handle specified functions and/or modules.幸运的是,您可以告诉 psyco 不要处理指定的函数和/或模块。

This is the procedure that I try to follow:这是我尝试遵循的程序:

  • import psyco;进口精神科; psyco.full() psyco.full()
  • If it's not fast enough, run the code through a profiler, see where the bottlenecks are.如果速度不够快,请通过分析器运行代码,看看瓶颈在哪里。 (DISABLE psyco for this step!) (在这一步禁用 psyco!)
  • Try to do things such as other people have mentioned to get the code at those bottlenecks as fast as possible.尝试做其他人提到的事情,以尽快获得那些瓶颈处的代码。
    • Stuff like [str(x) for x in l] or [x.strip() for x in l] is much, much slower than map(str, x) or map(str.strip, x).像 [str(x) for x in l] 或 [x.strip() for x in l] 之类的东西比 map(str, x) 或 map(str.strip, x) 慢得多。
  • After this, if I still need more speed, it's actually really easy to get PyRex up and running.在此之后,如果我仍然需要更高的速度,实际上很容易启动和运行 PyRex。 I first copy a section of python code, put it directly in the pyrex code, and see what happens.我先复制一段python代码,直接放到pyrex代码中,看看会发生什么。 Then I twiddle with it until it gets faster and faster.然后我玩弄它,直到它变得越来越快。

The canonical reference to how to improve Python code is here: PerformanceTips .关于如何改进 Python 代码的规范参考在这里: PerformanceTips I'd recommend against optimizing in C unless you really need to though.除非您确实需要,否则我建议不要在 C 中进行优化。 For most applications, you can get the performance you need by following the rules posted in that link.对于大多数应用程序,您可以通过遵循该链接中发布的规则来获得所需的性能。

If using psyco, I'd recommend psyco.profile() instead of psyco.full() .如果使用 psyco,我会推荐psyco.profile()而不是psyco.full() For a larger project it will be smarter about the functions that got optimized and use a ton less memory.对于更大的项目,优化的功能和使用更少的内存会更聪明。

I would also recommend looking at iterators and generators.我还建议查看迭代器和生成器。 If your application is using large data sets this will save you many copies of containers.如果您的应用程序使用大型数据集,这将为您节省许多容器副本。

Besides the (great) psyco and the (nice) shedskin , I'd recommend trying cython a great fork of pyrex .除了(很棒的) psyco和(漂亮的) shedskin 之外,我建议您尝试使用cython一个很棒的pyrex叉子。

Or, if you are not in a hurry, I recommend to just wait.或者,如果您不着急,我建议您等待。 Newer python virtual machines are coming, and unladen-swallow will find its way into the mainstream.更新的 python 虚拟机即将到来, unladen-swallow将进入主流。

A couple of ways to speed up Python code were introduced after this question was asked:提出这个问题后,介绍了几种加速 Python 代码的方法:

  • Pypy has a JIT-compiler, which makes it a lot faster for CPU-bound code. Pypy有一个 JIT 编译器,这使得它对于受 CPU 限制的代码要快得多。
  • Pypy is written in Rpython , a subset of Python that compiles to native code, leveraging the LLVM tool-chain. Pypy 是用 Rpython 编写的, Rpython是 Python 的一个子集,它利用 LLVM 工具链编译为本机代码。

For an established project I feel the main performance gain will be from making use of python internal lib as much as possible.对于已建立的项目,我认为主要的性能提升将来自尽可能多地使用 python 内部库。

Some tips are here: http://blog.hackerearth.com/faster-python-code一些提示在这里: http : //blog.hackerearth.com/faster-python-code

还有Python→11l→C++ transpiler,可以从这里下载。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM