简体   繁体   English

Python:多线程不会改善运行时

[英]Python : multithread doesn't improve run time

I'm trying to write simple multi-threaded python script: 我正在尝试编写简单的多线程python脚本:

from multiprocessing.dummy import Pool as ThreadPool

def resize_img_folder_multithreaded(img_fldr_src,img_fldr_dst,max_num_of_thread):

    images = glob.glob(img_fldr_src+'/*.'+img_file_extension)
    pool = ThreadPool(max_num_of_thread) 

    pool.starmap(resize_img,zip(images,itertools.repeat(img_fldr_dst)))
    # close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 


def resize_img(img_path_src,img_fldr_dest):
    #print("about to resize image=",img_path_src)
    image = io.imread(img_path_src)         
    image = transform.resize(image, [300,300])
    io.imsave(os.path.join(img_fldr_dest,os.path.basename(img_path_src)),image)      
    label = img_path_src[:-4] + '.xml'
    if copyLabels is True and os.path.exists(label) is True :
        copyfile(label,os.path.join(img_fldr_dest,os.path.basename(label)))

setting the argument max_num_of_thread to any number in [1...10] doesn't improve my run time at all ( for 60 images it stays around 30 sec ) , the max_num_of_thread =10 my PC got stuck 将参数max_num_of_thread设置为[1 ... 10]中的任何数字都不会改善我的运行时间( for 60 images it stays around 30 sec ), max_num_of_thread = 10我的PC卡住了

my question is : what is the bottle neck in my code , why can't I see any improvement? 我的问题是:我的代码中的瓶颈是什么,为什么我看不到任何改进?

some data about my PC : 关于我的电脑的一些数据:

python -V
Python 3.6.4 :: Anaconda, Inc.


cat /proc/cpuinfo | grep 'processor' | wc -l
4

cat /proc/meminfo 
MemTotal:        8075960 kB
MemFree:         3943796 kB
MemAvailable:    4560308 kB

cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10

Blame the GIL. 归咎于GIL。

Python has this mechanism called the GIL, global interpreter lock. Python有这种称为GIL的机制,即全局解释器锁。 It is basically a mutex that prevents native threads from executing Python bytecodes at once. 它基本上是一个互斥锁,可以防止本机线程一次执行Python字节码。 This must be done since Python's (at least, CPython) memory management is not thread-safe. 必须这样做,因为Python(至少是CPython)内存管理不是线程安全的。

In other words, the GIL will prevent you from running multiple threads at the same time. 换句话说,GIL将阻止您同时运行多个线程。 Essentially, you're running one thread at a time. 基本上,您一次运行一个线程。 Multi-threading, in the sense of exploiting multiple CPU cores, is more like an illusion in Python. 在利用多个CPU核心的意义上,多线程更像是Python中的错觉。

Fortunately, there is a way to solve this problem. 幸运的是,有一种方法可以解决这个问题。 it's a bit more expensive resource-wise though. 虽然它在资源方面有点贵。 You can utilize multiprocessing instead. 您可以使用多处理。 Python has excellent support for this through the multiprocessing module. Python通过multiprocessing模块为此提供了出色的支持。 This way, you will be able to achieve parallelism [1] . 这样,您就可以实现并行性[1]

You might ask why isn't multiprocessing affected by the GIL limitations. 您可能会问为什么多重处理不受GIL限制的影响。 The answer is pretty simple. 答案很简单。 Each new process of your program has a different instance (I think there's a better word for this) of the Python interpreter. 程序的每个新进程都有一个不同的实例(我认为这是一个更好的词)Python解释器。 This means that each process has its own GIL. 这意味着每个进程都有自己的GIL。 So, the processes are not managed by the GIL, but by the OS itself. 因此,这些流程不是由GIL管理,而是由操作系统本身管理。 This provides you with parallelism [2] . 这为您提供了并行性[2]


References 参考

The problem come from the Global Interpreter Lock or GIL. 问题来自Global Interpreter Lock或GIL。 GIL only let one thread run at a time so if you want to do parallel computation use Processing.Pool : GIL一次只允许一个线程运行,所以如果你想进行并行计算,请使用Processing.Pool

import multiprocessing

pool = multiprocessing.Pool(max_num_of_process)  # Use number of core as max number

!!! multiprocessing.dummy Is a wrapper arround the threading module, it let you interact with threading Pool as you where using Processing Pool. multiprocessing.dummy是线程模块的一个包装器,它允许您与使用Processing Pool的线程池进行交互。

You should only use multiprocessing with the number of cpu cores you have available. 您应该只对可用的cpu核心数使用多处理。 You are also not using a Queue, so the pool of resources are doing the same work. 您也没有使用队列,因此资源池正在执行相同的工作。 You need to add a queue to your code. 您需要为代码添加队列。

Filling a queue and managing multiprocessing in python 填充队列并在python中管理多处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM