简体   繁体   English

Python:何时使用线程与多处理

[英]Python:When to use Threads vs. Multiprocessing

在效率和代码清晰度方面,在决定使用线程或多处理时,要遵循哪些好的指导原则?

Many of the differences between threading and multiprocessing are not really Python-specific, and some differences are specific to a certain Python implementation. 线程和多处理之间的许多差异并不是特定于Python的,并且某些差异特定于某个Python实现。

For CPython, I would use the multiprocessing module in either fo the following cases: 对于CPython,我会在以下情况中使用multiprocessing模块:

  • I need to make use of multiple cores simultaneously for performance reasons. 出于性能原因,我需要同时使用多个内核。 The global interpreter lock (GIL) would prevent any speedup when using threads. 全局解释器锁(GIL)将阻止使用线程时的任何加速。 (Sometimes you can get away with threads in this case anyway, for example when the main work is done in C code called via ctypes or when using Cython and explicitly releasing the GIL where approriate. Of course the latter requires extra care.) Note that this case is actually rather rare. (有时你可以在这种情况下逃避线程,例如当主要工作在通过ctypes调用的C代码中完成时或者在使用Cython并明确释放GIL的情况下进行适当的时候。当然后者需要额外注意。)注意这种情况实际上相当罕见。 Most applications are not limited by processor time, and if they really are, you usually don't use Python. 大多数应用程序不受处理器时间的限制,如果确实如此,通常不使用Python。

  • I want to turn my application into a real distributed application later. 我想稍后将我的应用程序转换为真正的分布式应用程序。 This is a lot easier to do for a multiprocessing application. 对于多处理应用程序来说,这样做要容易得多。

  • There is very little shared state needed between the the tasks to be performed. 要执行的任务之间需要很少的共享状态。

In almost all other circumstances, I would use threads. 在几乎所有其他情况下,我会使用线程。 (This includes making GUI applications responsive.) (这包括使GUI应用程序响应。)

For code clarity , one of the biggest things is to learn to know and love the Queue object for talking between threads (or processes, if using multiprocessing ... multiprocessing has its own Queue object ). 为了清晰代码 ,最重要的事情之一就是学会了解并喜欢Queue对象,以便在线程(或进程之间)进行通信,如果使用多处理 ...多处理有自己的Queue对象 Queues make things a lot easier and I think enable a lot cleaner code. 队列使事情变得更容易,我认为启用更清晰的代码。

I had a look for some decent Queue examples, and this one has some great examples of how to use them and how useful they are (with the exact same logic applying for the multiprocessing Queue): http://effbot.org/librarybook/queue.htm 了一些不错的Queue示例,这个例子有一些很好的例子,说明如何使用它们以及它们的用处(使用完全相同的逻辑申请多处理队列): http//effbot.org/librarybook/ queue.htm

For efficiency , the details and outcome may not noticeably affect most people, but for python <= 3.1 the implementation for CPython has some interesting (and potentially brutal), efficiency issues on multicore machines that you may want to know about. 为了提高效率 ,细节和结果可能不会对大多数人产生明显影响,但对于python <= 3.1,CPython的实现在您可能想要了解的多核机器上有一些有趣的(并且可能是残酷的)效率问题。 These issues involve the GIL . 这些问题涉及GIL David Beazley did a video presentation on it a while back and it is definitely worth watching. David Beazley不久前做了一段视频演示绝对值得一看。 More info here , including a followup talking about significant improvements on this front in python 3.2. 这里有更多信息,包括后续讨论python 3.2中这方面的重大改进。

Basically, my cheap summary of the GIL-related multicore issue is that if you are expecting to get full multi-processor use out of CPython <= 2.7 by using multiple threads, don't be surprised if performance is not great, or even worse than single core. 基本上,我对GIL相关多核问题的简要总结是,如果你希望通过使用多个线程从CPython <= 2.7中获得完整的多处理器使用,那么如果性能不是很好,甚至更糟,也不要感到惊讶比单核心。 But if your threads are doing a bunch of i/o (file read/write, DB access, socket read/write, etc), you may not even notice the problem. 但是如果你的线程正在进行一堆i / o(文件读/写,数据库访问,套接字读/写等),你可能甚至都没有注意到这个问题。

The multiprocessing module avoids this potential GIL problem entirely by creating a python interpreter (and GIL) per processor. 多处理模块完全通过为每个处理器创建python解释器(和GIL)来避免这种潜在的GIL问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM