[英]Parallelizing through Multi-threading and Multi-processing taking significantly more time than serial
我正在嘗試學習如何在python中進行並行編程。 我編寫了一個簡單的int平方函數,然后在串行,多線程和多進程中運行它:
import time
import multiprocessing, threading
import random
def calc_square(numbers):
sq = 0
for n in numbers:
sq = n*n
def splita(list, n):
a = [[] for i in range(n)]
counter = 0
for i in range(0,len(list)):
a[counter].append(list[i])
if len(a[counter]) == len(list)/n:
counter = counter +1
continue
return a
if __name__ == "__main__":
random.seed(1)
arr = [random.randint(1, 11) for i in xrange(1000000)]
print "init completed"
start_time2 = time.time()
calc_square(arr)
end_time2 = time.time()
print "serial: " + str(end_time2 - start_time2)
newarr = splita(arr,8)
print 'split complete'
start_time = time.time()
for i in range(8):
t1 = threading.Thread(target=calc_square, args=(newarr[i],))
t1.start()
t1.join()
end_time = time.time()
print "mt: " + str(end_time - start_time)
start_time = time.time()
for i in range(8):
p1 = multiprocessing.Process(target=calc_square, args=(newarr[i],))
p1.start()
p1.join()
end_time = time.time()
print "mp: " + str(end_time - start_time)
輸出:
init completed
serial: 0.0640001296997
split complete
mt: 0.0599999427795
mp: 2.97099995613
但是,正如您所看到的,發生了一些奇怪的事情,而mt與serial花費的時間相同,而mp實際上花費的時間更長(幾乎長了50倍)。
我究竟做錯了什么? 有人可以向正確的方向推動我學習python並行編程嗎?
編輯01
查看評論,我發現也許不返回任何內容的函數似乎毫無意義。 我什至嘗試這樣做的原因是因為以前我嘗試了以下add函數:
def addi(numbers):
sq = 0
for n in numbers:
sq = sq + n
return sq
我嘗試將每個部分的加法返回到序列號加法器,這樣至少可以看到在純串行實現方面性能有所提高。 但是,我無法弄清楚如何存儲和使用返回的值,這就是我試圖找出比這更簡單的東西的原因,這僅僅是對數組進行分割並在其上運行一個簡單的函數。
謝謝!
我認為, multiprocessing
需要很長時間才能創建和啟動每個進程。 我將程序更改為arr
大小的10倍,並更改了進程啟動的方式,並且略有加快:
(另請注意python 3)
import time
import multiprocessing, threading
from multiprocessing import Queue
import random
def calc_square_q(numbers,q):
while q.empty():
pass
return calc_square(numbers)
if __name__ == "__main__":
random.seed(1) # note how big arr is now vvvvvvv
arr = [random.randint(1, 11) for i in range(10000000)]
print("init completed")
# ...
# other stuff as before
# ...
processes=[]
q=Queue()
for arrs in newarr:
processes.append(multiprocessing.Process(target=calc_square_q, args=(arrs,q)))
print('start processes')
for p in processes:
p.start() # even tho' each process is started it waits...
print('join processes')
q.put(None) # ... for q to become not empty.
start_time = time.time()
for p in processes:
p.join()
end_time = time.time()
print("mp: " + str(end_time - start_time))
還要注意上面的說明,我是如何在兩個不同的循環中創建和啟動流程,然后最后在第三個循環中加入這些流程的。
輸出:
init completed
serial: 0.53214430809021
split complete
start threads
mt: 0.5551605224609375
start processes
join processes
mp: 0.2800724506378174
arr
大小增加10的另一個因素:
init completed
serial: 5.8455305099487305
split complete
start threads
mt: 5.411392450332642
start processes
join processes
mp: 1.9705185890197754
是的,盡管Threads
似乎較慢,但我也在python 2.7中進行了嘗試。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.