在大型矩阵的所有行对上进行计算的有效方法？

Question

I need to do some time-consuming calculations on all row-pairs of a large matrix M, like: 我需要对大型矩阵M的所有行对进行一些耗时的计算，例如：

for i in range(n):
    for j in range(i+1,n):
        time_comsuming_calculation(M[i,:],M[j:])

Since I am new to parallel computing , after studied the example in Writing parallel computation results in shared memory , I tried to do parallel computing with joblib as below: 由于我是并行计算的新手，在研究了“ 在共享内存中编写并行计算结果”中的示例之后，我尝试使用joblib进行并行计算，如下所示：

dump(M, M_name)
M=load(M_name,mmap_mode='r')
...
Parallel(n_jobs=num_cores)(delayed(paracalc)(u,v,M)
                                for u,v in itertools.combinations(range(M.shape[0]),2))

However, it turned to be unbearably much slower than non-parallel version. 但是，它变得比非并行版本慢得多。 Computing on each row-pair consumed even more seconds than num_cores=1 . 与num_cores=1相比，在每个行对上进行计算所num_cores=1时间甚至更多。 I am wondering what's wrong with my parallel implementation. 我想知道我的并行实现有什么问题。 Is mpi4py a better choice? mpi4py是更好的选择吗？ Any suggestions will be appreciated. 任何建议将不胜感激。

Answer 1

Okay, still no answers but I've managed to work it out. 好的，仍然没有答案，但我设法解决了。 The first interesting fact I found is when I commented out these two lines, 我发现的第一个有趣事实是，当我注释掉这两行时，

# dump(M, M_name)
# M=load(M_name,mmap_mode='r')

by which the memmap array were no longer used to take place of memory array, it went much faster. 通过这种方式，memmap阵列不再用于代替内存阵列，它的运行速度要快得多。 I don't know why up to now. 我不知道为什么到现在为止。 Is there a memmap lock or something? 有memmap锁或其他东西吗？

Then, I read this article Parallel and HPC with Python (or numpy) and decided to turn to mpi4py . 然后，我阅读了这篇使用Python（或numpy）的Parallel和HPC的文章，并决定转向mpi4py 。 After hours of struggling with debugging, I got satisfying results. 经过数小时的调试工作，我得到了令人满意的结果。

在大型矩阵的所有行对上进行计算的有效方法？

问题描述

1 个解决方案

解决方案1
0 2017-04-29 09:26:47

在大型矩阵的所有行对上进行计算的有效方法？

问题描述

1 个解决方案

解决方案1 0 2017-04-29 09:26:47

解决方案1
0 2017-04-29 09:26:47