[英]Python multiprocessing code running slower than single threaded one
The Python Multiprocessing performance on my i7 7700HQ is significantly slower than non-parallel one. 我的i7 7700HQ上的Python多处理性能明显慢于非并行处理。
While planning to parallelize my Select and Update code for my single table database in mssql, I tried to first parallelize a simple code. 计划在mssql中为我的单表数据库并行化我的Select和Update代码时,我尝试首先并行化一个简单代码。 The program simply prints multiples of the argument.
该程序只打印参数的倍数。 I tried single-threaded, multi-process with Process object and also with Pool object.
我尝试使用Process对象以及Pool对象的单线程,多进程。 Single threaded always performed best.
单线程始终表现最佳。
import time
def foobar(a):
for i in range(1,10000):
print(a*i)
return
if __name__ == "__main__":
Tthreading = time.clock()
p1= Process(target= foobar, args=(3,))
p2 = Process(target= foobar, args= (2,))
p3 = Process(target= foobar, args= (4,))
p4 = Process(target=foobar, args=(123,))
allprocess.start
allprocess.join
print(time.clock() - Tthreading)
#Single-threaded
Tsingle = time.clock()
foobar(3)
foobar(2)
foobar(4)
foobar(123)
print(time.clock() - Tsingle)
I expected the multi-process to be much faster since there are no shared resources(no functions, variables that need to be accessed between threads) and IPC. 我期望多进程会更快,因为没有共享资源(没有函数,线程之间需要访问的变量)和IPC。
Single Threaded time: 0.32s 单线程时间:0.32s
Multi-Process time: 0.53s 多处理时间:0.53s
Actually, there is one important shared resource in your example, your monitor (or stdout
). 实际上,示例中有一个重要的共享资源,即监视器(或
stdout
)。
print
is a relatively slow operation (compared to CPU cycles...), and it causes contention between your proccesses. print
是一个相对较慢的操作(与CPU周期相比...),它会导致进程之间的争用。
Benchmarking parallel work correctly is a tough task, it is affected by the great many factors and features of a CPU (eg cache). 对并行工作进行正确的基准测试是一项艰巨的任务,它受CPU(例如高速缓存)的众多因素和功能的影响。
Try to replace your workload with one that is very suited for multiprocessing (eg working in a parallel on different parts of an array, matrix multiplication...) 尝试用非常适合多处理的工作负载代替工作负载(例如,并行处理数组的不同部分,矩阵乘法...)
One more important thing: spawning the new processes also takes time, and for it to pay off the work done in each process needs to be significant. 更为重要的一件事是:产生新流程也需要时间,并且要使每个流程中完成的工作还清,这是很重要的。 If you increase your loop's range a little bit, the difference should be in favor of the Multi-process version:
如果稍微增加循环的范围,则区别应该是多进程版本:
import time
from multiprocessing import Process
def foobar(a):
for i in range(1,10000000):
a*i
return
if __name__ == "__main__":
Tthreading = time.time()
p1= Process(target= foobar, args=(3,))
p2 = Process(target= foobar, args= (2,))
p3 = Process(target= foobar, args= (4,))
p4 = Process(target=foobar, args=(123,))
allprocess = [p1,p2,p3,p4]
for p in allprocess:
p.start()
for p in allprocess:
p.join()
print(time.time() - Tthreading)
#Single-threaded
Tsingle = time.time()
foobar(3)
foobar(2)
foobar(4)
foobar(123)
print(time.time() - Tsingle)
on my machine this outputs: 在我的机器上,输出:
0.44509196281433105
0.44509196281433105
1.3775699138641357
1.3775699138641357
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.