[英]Multithreading accelerates CPU bound tasks despite of GIL
I recently learned about GIL in python.我最近在 python 中了解了 GIL。 I was doing some benchmarks and found out that multithreading actually does improve the performance.
我正在做一些基准测试,发现多线程实际上确实提高了性能。 I compare elementwise NumPy operations that do not use any internal multithreading.
我比较了不使用任何内部多线程的元素 NumPy 操作。 In the first test, I call a function 32 times sequentially from a for loop.
在第一个测试中,我从 for 循环中依次调用 function 32 次。 In the second case, I use multithreading.
在第二种情况下,我使用多线程。 But if GIL was working, in the second case only 1 thread should be active at a time, so that the execution time should be approximately equal (even worse in the second case due to multithreading overhead).
但是如果 GIL 正在工作,在第二种情况下,一次应该只有 1 个线程处于活动状态,因此执行时间应该大致相等(由于多线程开销,在第二种情况下甚至更糟)。 This is not what I observed.
这不是我观察到的。
import os
import threading
import numpy as np, time
def elemntwiseoperations(a,b):
np.exp(a)+np.sin(b)
N=1024
a=np.random.rand(N,N)
b=np.random.rand(N,N)
NoTasks=32
start_time = time.time()
for i in range(NoTasks):
elemntwiseoperations(a,b)
print("Execution time for {} tasks: {} seconds, {} seconds per task".format(NoTasks,time.time() - start_time,(time.time() - start_time)/NoTasks))
threads=[]
start_time = time.time()
for i in range(NoTasks):
x = threading.Thread(target=elemntwiseoperations,name=''.format(i),args=(a,b))
x.start()
threads.append(x)
for process in threads:
process.join()
print("Execution time for {} tasks: {} seconds, {} seconds per task".format(NoTasks,time.time() - start_time,(time.time() - start_time)/NoTasks))
Output: Output:
Execution time for 32 tasks: 0.5654711723327637 seconds, 0.01767103374004364 seconds per task
Execution time for 32 tasks: 0.17153215408325195 seconds, 0.005360409617424011 seconds per task
PS MAC os, python 3.7.6, Cpython implementation. PS MAC 操作系统,python 3.7.6,Cpython 实现。
So, my current best guess is the following: In the first case, one thread starts C routines sequentially.因此,我目前的最佳猜测如下:在第一种情况下,一个线程按顺序启动 C 例程。 It waits for each to finish before starting the new one.
在开始新的之前,它会等待每个完成。 Since I only use elementwise operation that are not parallelized in numpy, only one thread is invloved in the whole process.
由于我只使用 numpy 中未并行化的元素操作,因此整个过程只涉及一个线程。
In the second case, I call for 32 virtual threads, each is affected by the GIL.在第二种情况下,我调用了 32 个虚拟线程,每个都受 GIL 影响。 The first thread starts up C routine and gives GIL control to the second thread, then the second thread starts C routine and gives control to the third thread, and so on.
第一个线程启动 C 例程并将 GIL 控制权交给第二个线程,然后第二个线程启动 C 例程并将控制权交给第三个线程,依此类推。 Even though C routines are called not at the same time, they all execute concurtently, as C is not affected by GIL.
尽管 C 例程不是同时调用的,但它们都是同时执行的,因为 C 不受 GIL 影响。
I don't know how to actually check it, but this is how I understand it after reading a couple of python blogs about GIL.我不知道如何实际检查它,但这是我在阅读了几篇关于 GIL 的 python 博客后理解的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.