简体   繁体   English

多处理比python中的线程慢

[英]multiprocessing is slower than thread in python

I have tested a multiprocess and thread in python, but multiprocess is slower than thread, and I calculate a distance using editdistance, my code like: 我已经在python中测试了多进程和线程,但是多进程比线程慢,并且我使用editdistance计算距离,我的代码如下:

def calc_dist(kw, trie_word):
    dists = []

    while len(trie_word) != 0:
        w = trie_word.pop()
        dist = editdistance.eval(kw, w)
        dists.append((w, dist))

    return dists

if __name__ == "__main__":
    word_list = [str(i) for i in range(1, 10000001)]
    key_word = '2'
    print("calc")
    s = time.time()
    with Pool(processes=4) as pool: 
        result = pool.apply_async(calc_dist, (key_word, word_list)) 
        print(len(result.get())) 
    print("用时",time.time()-s)

Using threading: 使用线程:

class DistThread(threading.Thread):
    def __init__(self, func, args):
        super(DistThread, self).__init__()
        self.func = func
        self.args = args
        self.dists = None

    def run(self):
        self.dists = self.func(*self.args)

    def join(self):
        super().join(self)
        return self.dists

In my computer, it consumes about 118s, but thread takes about 36s, where is wrong with it? 在我的计算机上,它消耗约118s,但是线程花费约36s,这哪里出了问题?

a couple of issues: 几个问题:

  1. a significant amount of time will be spent serialising the data so it can be sent to the other process while threads share the same address space so pointers can be used 大量的时间将花费在序列化数据上,因此可以将其发送到另一个进程,而线程共享相同的地址空间,因此可以使用指针

  2. your current code is only using one process to do all the calcs with multiprocessing. 您当前的代码仅使用一个进程即可进行多处理的所有计算。 you need to seperate your array into "chunks" somehow so that it can be processed via multiple workers 您需要以某种方式将数组分成“块”,以便可以通过多个工作程序对其进行处理

eg: 例如:

import time
from multiprocessing import Pool
import editdistance

def calc_one(trie_word):
    return editdistance.eval(key_word, trie_word)

if __name__ == "__main__":
    word_list = [str(i) for i in range(1, 10000001)]
    key_word = '2'

    print("calc")
    s = time.time()
    with Pool(processes=4) as pool: 
        result = pool.map(calc_one, word_list, chunksize=10000) 
        print(len(result))
    print("time",time.time()-s)

    s = time.time()
    result = list(calc_one(w) for w in word_list)
    print(len(result))
    print("time",time.time()-s)

this relies on key_word being a global variable. 这取决于key_word是全局变量。 for me, the version using multiple processes takes ~5.3 seconds while the second version takes ~16.9 secs. 对我来说,使用多个进程的版本大约需要5.3秒,而第二个版本大约需要16.9秒。 not 4 times as quick as the data still needs to be sent back and forth, but pretty good 仍然不是来回发送数据的四倍,但还不错

I had a similar experience with threading and multi processing inside Python to consume CSVS that had a large amount of data. 我在Python内部使用线程和进行多处理方面也有过类似的经验,以使用拥有大量数据的CSVS。 I had a small look into this and found that processing spawns multiple processes to perform tasks which can be slower than just running one threaded process since threading runs in one place. 我对此进行了细小的研究,发现处理产生了多个进程来执行任务,这比仅运行一个线程的进程要慢,因为线程在一个地方运行。 There is a more definitive answer here: Multiprocessing vs Threading Python . 这里有一个更明确的答案: Multiprocessing vs Threading Python

Pasting answer from link incase link disappears; 从链接粘贴答案,以防链接消失;

The threading module uses threads , the multiprocessing module uses processes . 线程模块使用threads ,多处理模块使用processes The difference is that threads run in the same memory space, while processes have separate memory. 不同之处在于线程在相同的内存空间中运行,而进程具有单独的内存。 This makes it a bit harder to share objects between processes with multiprocessing. 这使得在具有多处理的进程之间共享对象更加困难。 Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. 由于线程使用相同的内存,因此必须采取预防措施,否则两个线程将同时写入同一内​​存。 This is what the global interpreter lock is for. 这就是全局解释器锁的作用。

Spawning processes is a bit slower than spawning threads. 生成过程比生成线程要慢一些。 Once they are running, there is not much difference. 一旦运行,就没有太大的区别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM