[英]Efficient way to calculate on all row-pairs of a large matrix?
I need to do some time-consuming calculations on all row-pairs of a large matrix M, like: 我需要对大型矩阵M的所有行对进行一些耗时的计算,例如:
for i in range(n):
for j in range(i+1,n):
time_comsuming_calculation(M[i,:],M[j:])
Since I am new to parallel computing , after studied the example in Writing parallel computation results in shared memory , I tried to do parallel computing with joblib as below: 由于我是并行计算的新手,在研究了“ 在共享内存中编写并行计算结果”中的示例之后,我尝试使用joblib进行并行计算,如下所示:
dump(M, M_name)
M=load(M_name,mmap_mode='r')
...
Parallel(n_jobs=num_cores)(delayed(paracalc)(u,v,M)
for u,v in itertools.combinations(range(M.shape[0]),2))
However, it turned to be unbearably much slower than non-parallel version. 但是,它变得比非并行版本慢得多。 Computing on each row-pair consumed even more seconds than
num_cores=1
. 与
num_cores=1
相比,在每个行对上进行计算所num_cores=1
时间甚至更多。 I am wondering what's wrong with my parallel implementation. 我想知道我的并行实现有什么问题。 Is
mpi4py
a better choice? mpi4py
是更好的选择吗? Any suggestions will be appreciated. 任何建议将不胜感激。
Okay, still no answers but I've managed to work it out. 好的,仍然没有答案,但我设法解决了。 The first interesting fact I found is when I commented out these two lines,
我发现的第一个有趣事实是,当我注释掉这两行时,
# dump(M, M_name)
# M=load(M_name,mmap_mode='r')
by which the memmap array were no longer used to take place of memory array, it went much faster. 通过这种方式,memmap阵列不再用于代替内存阵列,它的运行速度要快得多。 I don't know why up to now.
我不知道为什么到现在为止。 Is there a memmap lock or something?
有memmap锁或其他东西吗?
Then, I read this article Parallel and HPC with Python (or numpy) and decided to turn to mpi4py
. 然后,我阅读了这篇使用Python(或numpy)的Parallel和HPC的文章,并决定转向
mpi4py
。 After hours of struggling with debugging, I got satisfying results. 经过数小时的调试工作,我得到了令人满意的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.