[英]Using multiple CPUs for a very fast function in Python
What I have is a function that can be looped 10,000 times in 1 second. 我所拥有的功能可以在1秒钟内循环10,000次。 However, I need to perform this function tens to hundreds of millions of times.
但是,我需要执行此功能数以千万计。 As expected due to the overhead of using CPUs, using the native
multiprocessing
package with my 4 cores slows the 10k loop to 1.5 seconds. 正如预期的那样,由于使用CPU的开销,将本地
multiprocessing
程序包与我的4个内核一起使用会将10k循环速度减慢到1.5秒。 Using the chunksize parameter in multiprocessing
helped trivially. 在
multiprocessing
使用chunksize参数的帮助不大。 Is there any way to get multiple processes to call this function with a speedup greater than the overhead? 有没有办法让多个进程以大于开销的速度调用此函数?
A truncated version of the function: 该函数的截断版本:
rands = np.random.random((200, 1000000))
def randfunc(i):
Q = np.concatenate([rands[:,[i]], rands[:,[i]] * rands[:,[i+1]]],axis=1)
Q2 = np.dot(np.transpose(Q),Q)
Q3 = np.linalg.inv(Q2) * Q2[1,1]
return Q3
I was able to cut the run time in half by using the ipyparallel
package simply using map_sync
instead of the multiprocessing
package to parallelize the function. 通过使用
ipyparallel
包,只需使用map_sync
而不是multiprocessing
包,就可以将运行时间缩短一半,以使函数并行化。 I'm not really sure why the former package has less overhead than the latter, but for the former, loading the data did take long, as opposed to the latter, which recognizes rands
as a variable during parallel execution. 我不太确定为什么前一个包的开销要比后者少,但是对于前一个包,加载数据确实要花很长时间,而后者则在并行执行期间将
rands
识别为变量。 However, in both cases the data are stored in RAM. 但是,在两种情况下,数据都存储在RAM中。 If anyone reads this and knows the reason why
ipyparallel
is faster, do comment. 如果有人
ipyparallel
,并且知道ipyparallel
更快的原因,请发表评论。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.