简体   繁体   English

在Python中使用多个CPU实现非常快速的功能

[英]Using multiple CPUs for a very fast function in Python

What I have is a function that can be looped 10,000 times in 1 second. 我所拥有的功能可以在1秒钟内循环10,000次。 However, I need to perform this function tens to hundreds of millions of times. 但是,我需要执行此功能数以千万计。 As expected due to the overhead of using CPUs, using the native multiprocessing package with my 4 cores slows the 10k loop to 1.5 seconds. 正如预期的那样,由于使用CPU的开销,将本地multiprocessing程序包与我的4个内核一起使用会将10k循环速度减慢到1.5秒。 Using the chunksize parameter in multiprocessing helped trivially. multiprocessing使用chunksize参数的帮助不大。 Is there any way to get multiple processes to call this function with a speedup greater than the overhead? 有没有办法让多个进程以大于开销的速度调用此函数?

A truncated version of the function: 该函数的截断版本:

rands = np.random.random((200, 1000000))

def randfunc(i):
    Q = np.concatenate([rands[:,[i]], rands[:,[i]] * rands[:,[i+1]]],axis=1)
    Q2 = np.dot(np.transpose(Q),Q)
    Q3 = np.linalg.inv(Q2) * Q2[1,1]
    return Q3

I was able to cut the run time in half by using the ipyparallel package simply using map_sync instead of the multiprocessing package to parallelize the function. 通过使用ipyparallel包,只需使用map_sync而不是multiprocessing包,就可以将运行时间缩短一半,以使函数并行化。 I'm not really sure why the former package has less overhead than the latter, but for the former, loading the data did take long, as opposed to the latter, which recognizes rands as a variable during parallel execution. 我不太确定为什么前一个包的开销要比后者少,但是对于前一个包,加载数据确实要花很长时间,而后者则在并行执行期间将rands识别为变量。 However, in both cases the data are stored in RAM. 但是,在两种情况下,数据都存储在RAM中。 If anyone reads this and knows the reason why ipyparallel is faster, do comment. 如果有人ipyparallel ,并且知道ipyparallel更快的原因,请发表评论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM