简体   繁体   English

多处理多线程 GIL?

[英]Multiprocessing multithreading GIL?

So, since several days I do a lot of research about multiprocessing and multithreading on python and i'm very confused about many thing.因此,几天以来,我对 python 上的多处理和多线程进行了大量研究,但我对很多事情感到非常困惑。 So many times I see someone talking about GIL something that doesn't allow Python code to execute on several cpu cores, but when I code a program who create many threads I can see several cpu cores are active.很多次我看到有人在谈论 GIL 一些不允许 Python 代码在多个 cpu 内核上执行的东西,但是当我编写一个创建许多线程的程序时,我可以看到几个 cpu 内核处于活动状态。

1st question: What's is really GIL?第一个问题:什么是真正的 GIL? does it work?它有效吗? I think about something like when a process create too many thread the OS distributed task on multi cpu.我想到了一些事情,比如当一个进程在多 CPU 上创建太多线程时,OS 分布式任务。 Am I right?我对吗?

Other thing, I want take advantage of my cpus.另一件事,我想利用我的CPU。 I think about something like create as much process as cpu core and on this each process create as much thread as cpu core.我想像创建与 cpu 核心一样多的进程,并在此每个进程上创建与 cpu 核心一样多的线程。 Am I on the right lane?我在正确的车道上吗?

To start with, GIL only ensures that only one cpython bytecode instruction will run at any given time.首先,GIL 仅确保在任何给定时间只有一条 cpython 字节码指令将运行。 It does not care about which CPU core runs the instruction.它不关心哪个 CPU 内核运行该指令。 That is the job of the OS kernel.这就是操作系统 kernel 的工作。

So going over your questions:所以回顾你的问题:

  1. GIL is just a piece of code. GIL 只是一段代码。 The CPython Virtual machine is the process which first compiles the code to Cpython bytecode but it's normal job is to interpret the CPython bytecode. CPython 虚拟机是首先将代码编译为 Cpython 字节码的进程,但它的正常工作是解释 CPython 字节码。 GIL is a piece of code that ensures a single line of bytecode runs at a time no matter how many threads are running. GIL 是一段代码,无论有多少线程正在运行,它都能确保一次运行一行字节码。 Cpython Bytecode instructions is what constitutes the virtual machine stack. Cpython 字节码指令构成了虚拟机堆栈。 So in a way, GIL will ensure that only one thread holds the GIL at any given point of time.所以在某种程度上,GIL 将确保在任何给定时间点只有一个线程持有 GIL。 (also that it keeps releasing the GIL for other threads and not starve them.) (而且它会不断为其他线程释放 GIL,而不是让它们挨饿。)

Now coming to your actual confusion.现在来到你真正的困惑。 You mention that when you run a program with many threads, you can see multiple (may be all) CPU cores firing up.您提到当您运行具有多个线程的程序时,您可以看到多个(可能是全部)CPU 内核启动。 So I did some experimentation and found that your findings are right (which is obvious) but the behaviour is similar in a non threaded version too.所以我做了一些实验,发现你的发现是正确的(这很明显),但在非线程版本中的行为也是相似的。

def do_nothing(i):
    time.sleep(0.0001)
    return i*2

ThreadPool(20).map(do_nothing, range(10000))
def do_nothing(i):
    time.sleep(0.0001)
    return i*2

[do_nothing(i) for i in  range(10000)]

The first one in multithreaded and the second one is not.第一个是多线程的,第二个不是。 When you compare the CPU usage by by both the programs, you will find that in both the cases multiple CPU cores will fire up.当您比较这两个程序的 CPU 使用率时,您会发现在这两种情况下都会启动多个 CPU 内核。 So what you noticed, although right, has not much to do with GIL or threading.所以你注意到的,虽然是对的,但与 GIL 或线程没有太大关系。 CPU usage going high in multiple cores is simply because OS kernel will distribute the execution of code to different cores based on availability.多核 CPU 使用率高只是因为 OS kernel 将根据可用性将代码的执行分配到不同的核。

Your last question is more of an experimental thing as different programs have different CPU/io usage.您的最后一个问题更像是一个实验性的问题,因为不同的程序具有不同的 CPU/io 使用率。 You just have to be aware of the cost of creation of a thread and a process and the working of GIL & PVM and optimize the number of threads and processes to get the maximum perf out.您只需要了解创建线程和进程的成本以及 GIL 和 PVM 的工作,并优化线程和进程的数量以获得最大的性能。

You can go through this talk by David Beazley to understand how multithreading can make your code perform worse (or better).您可以通过 David Beazley 的演讲了解多线程如何使您的代码性能更差(或更好)。

There are answers about what the Global Interpreter Lock (GIL) is here .这里有关于全局解释器锁 (GIL) 答案。 Buried among the answers is mention of Python "bytecode", which is central to the issue.隐藏在答案中的是提到 Python “字节码”,这是问题的核心。 When your program is compiled, the output is bytecode, ie low-level computer instructions for a fictitious "Python" computer, that gets interpreted by the Python interpreter.编译程序时,output 是字节码,即虚拟“Python”计算机的低级计算机指令,由 Python 解释器解释。 When the interpreter is executing a bytecode, it serializes execution by acquiring the Global Interpreter Lock.当解释器执行字节码时,它通过获取全局解释器锁来序列化执行。 This means that two threads cannot be executing bytecode concurrently on two different cores.这意味着两个线程不能在两个不同的内核上同时执行字节码。 This also means that true multi-threading is not implemented.这也意味着没有实现真正的多线程。 But does this mean that there is no reason to use threading?但这是否意味着没有理由使用线程? No: Here are a couple of situations where threading is still useful:否:以下是线程仍然有用的几种情况:

  1. For certain operations the interpreter will release the GIL, ie when doing I/O.对于某些操作,解释器将释放 GIL,即在执行 I/O 时。 So consider as an example the case where you want to fetch a lot of URLs from different websites.因此,以您想要从不同网站获取大量 URL 的情况为例。 Most of the time is spent waiting for a response to be returned once the request is made and this waiting can be overlapped even if formulating the requests has to be done serially.一旦发出请求,大部分时间都花在等待返回响应上,即使制定请求必须按顺序进行,这种等待也可能重叠。
  2. Many Python functions and modules are implemented in the C language and are not limited by any GIL restrictions.许多 Python 功能和模块都是用 C 语言实现的,不受任何 GIL 限制。 The numpy module is one such highly optimized package. numpy模块就是这样一种高度优化的 package。

Consequently, threading is best used when the tasks are not cpu-intensive, ie they do a lot of waiting for I/O to complete, or they do a lot of sleeping, etc.因此,最好在任务不是 CPU 密集型的情况下使用线程,即它们需要大量等待 I/O 完成,或者它们需要大量休眠等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM