简体   繁体   English

如何在多处理中将CPU内核分配给python进程?

[英]How do CPU cores get allocated to python processes in multiprocessing?

Let's say I am running multiple python processes(not threads) on a multi core CPU (say 4). 假设我在多核CPU(例如4)上运行多个python进程(而非线程)。 GIL is process level so GIL within a particular process won't affect other processes. GIL是流程级别的,因此特定流程中的GIL不会影响其他流程。

My question here is if the GIL within one process will take hold of only single core out of 4 cores or will it take hold of all 4 cores? 我的问题是,一个进程中的GIL是仅占据4个内核中的单个内核,还是占据全部4个内核?

If one process locks all cores at once, then multiprocessing should not be any better than multi threading in python. 如果一个进程一次锁定所有内核,那么多处理不应比python中的多线程好。 If not how do the cores get allocated to various processes? 如果不是,如何将内核分配给各个进程?

As an observation, in my system which is 8 cores (4*2 because of hyperthreading), when I run a single CPU bound process, the CPU usage of 4 out of 8 cores goes up. 观察一下,在我的8核系统中(由于超线程,这是4 * 2),当我运行单个CPU绑定的进程时,在8核中有4的CPU使用率上升。

Simplifying this: 简化一下:

4 python threads (in one process) running on a 4 core CPU will take more time than single thread doing same work (considering the work is fully CPU bound). 在一个4核CPU上运行的4个python线程(在一个进程中)比做相同工作的单个线程(考虑到该工作完全与CPU绑定)要花费更多的时间。 Will 4 different process doing that amount of work reduce the time taken by a factor of near 4? 4种不同的过程完成这么多的工作量会减少将近4倍的时间吗?

Python doesn't do anything to bind processes or threads to cores ; Python不做任何将进程或线程绑定到核心的事情; it just leaves things up to the OS. 它只是把事情留给了操作系统。 When you spawn a bunch of independent processes (or threads, but that's harder to do in Python), the OS's scheduler will quickly and efficiently get them spread out across your cores without you, or Python, needing to do anything (barring really bad pathological cases). 当您生成一堆独立的进程(或线程,但这在Python中很难做到)时,操作系统的调度程序将快速有效地将它们散布到您的内核中,而您或Python无需执行任何操作(除非出现非常糟糕的病理情况)案例)。


The GIL isn't relevant here. GIL与此处无关。 I'll get to that later, but first let's explain what is relevant. 我会得到以后,但首先让我们来解释一下什么相关的。

You don't have 8 cores. 您没有8个核心。 You have 4 cores, each of which is hyperthreaded . 您有4个核心,每个核心都是超线程的

Modern cores have a whole lot of "super-scalar" capacity. 现代内核具有大量的“超标量”容量。 Often, the instructions queued up in a pipeline aren't independent enough to take full advantage of that capacity. 通常,在管道中排队的指令不够独立,无法充分利用该功能。 What hyperthreading does is to allow the core to go fetch other instructions off a second pipeline when this happens, which are virtually guaranteed to be independent. 超线程的作用是允许内核在发生这种情况时从第二条管线中获取其他指令,这些指令实际上是独立的。 But it only allows that, not requires, because in some cases (which the CPU can usually decide better than you) the cost in cache locality would be worse than the gains in parallelism. 但这只允许这样做,而不是要求这样做,因为在某些情况下(CPU通常可以比您更好地做出决定),缓存局部性的成本将比并行性的收益要差。

So, depending on the actual load you're running, with four hyperthreaded cores, you may get full 800% CPU usage, or you may only get 400%, or (pretty often) somewhere in between. 因此,根据运行的实际负载(具有四个超线程内核),您可能会获得800%的CPU使用率,或者可能仅获得400%的CPU使用率,或者(通常)介于两者之间。

I'm assuming your system is configured to report 8 cores rather than 4 to userland, because that's the default, and that you're have at least 8 processes or a pool with default proc count and at least 8 tasks—obviously, if none of that is true, you can't possibly get 800% CPU usage… 我假设您的系统配置为向用户级报告8个核心,而不是4个,因为这是默认设置,并且您至少有8个进程或一个池,具有默认proc计数和至少8个任务-显然,如果没有确实如此,您可能无法获得800%的CPU使用率…

I'm also assuming you aren't using explicit locks, other synchronization, Manager objects, or anything else that will serialize your code. 我还假设您没有使用显式锁,其他同步, Manager对象或将序列化代码的其他任何方法。 If you do, obviously you can't get full parallelism. 如果这样做,显然您将无法获得完全的并行性。

And I'm also assuming you aren't using (mutable) shared memory, like a multiprocessing.Array that everyone writes to. 而且我还假设您没有使用(可变的)共享内存,就像每个人都写入的multiprocessing.Array一样。 This can cause cache and page conflicts that can be almost as bad as explicit locks. 这可能导致高速缓存和页面冲突,几乎与显式锁一样严重。


So, what's the deal with the GIL? 那么,与GIL有什么关系呢? Well, if you were running multiple threads within a process, and they were all CPU-bound, and they were all spending most of that time running Python code (as opposed to, say, spending most of that time running numpy operations that release the GIL), only one thread would run at a time. 好吧,如果您在一个进程中运行多个线程,并且它们都受CPU限制,那么他们大部分时间都在运行Python代码(而不是大部分时间都在运行numpy操作来释放线程)。 GIL),一次只能运行一个线程。 You could see: 您会看到:

  • 100% consistently on a single core, while the rest sit at 0%. 100%始终在单个内核上,其余部分为0%。
  • 100% pingponging between two or more cores, while the rest sit at 0%. 两个或多个核心之间100%的响应,其余的为0%。
  • 100% pingponging between two or more cores, while the rest sit at 0%, but with some noticeable overlap where two cores at once are way over 0%. 两个或更多核心之间的100%互为响应,而其余核心为0%,但是有些重叠明显,其中两个核心一次超过0%。 This last one might look like parallelism, but it isn't—that's just the switching overhead becoming visible. 最后一个可能看起来像并行性,但并非如此-只是交换开销变得可见。

But you're not running multiple threads, you're running separate processes, each of which has its own entirely independent GIL. 但是您没有在运行多个线程,而是在运行单独的进程,每个进程都有其自己的完全独立的GIL。 And that's why you're seeing four cores at 100% rather than just one. 这就是为什么您看到100%的四个内核而不是一个的原因。

CPU / CPU核心分配过程由操作系统处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM