简体   繁体   English

在大型矩阵的所有行对上进行计算的有效方法?

[英]Efficient way to calculate on all row-pairs of a large matrix?

I need to do some time-consuming calculations on all row-pairs of a large matrix M, like: 我需要对大型矩阵M的所有行对进行一些耗时的计算,例如:

for i in range(n):
    for j in range(i+1,n):
        time_comsuming_calculation(M[i,:],M[j:])

Since I am new to parallel computing , after studied the example in Writing parallel computation results in shared memory , I tried to do parallel computing with joblib as below: 由于我是并行计算的新手,在研究了“ 在共享内存编写并行计算结果”中的示例之后,我尝试使用joblib进行并行计算,如下所示:

dump(M, M_name)
M=load(M_name,mmap_mode='r')
...
Parallel(n_jobs=num_cores)(delayed(paracalc)(u,v,M)
                                for u,v in itertools.combinations(range(M.shape[0]),2))

However, it turned to be unbearably much slower than non-parallel version. 但是,它变得比非并行版本慢得多。 Computing on each row-pair consumed even more seconds than num_cores=1 . num_cores=1相比,在每个行对上进行计算所num_cores=1时间甚至更多。 I am wondering what's wrong with my parallel implementation. 我想知道我的并行实现有什么问题。 Is mpi4py a better choice? mpi4py是更好的选择吗? Any suggestions will be appreciated. 任何建议将不胜感激。

Okay, still no answers but I've managed to work it out. 好的,仍然没有答案,但我设法解决了。 The first interesting fact I found is when I commented out these two lines, 我发现的第一个有趣事实是,当我注释掉这两行时,

# dump(M, M_name)
# M=load(M_name,mmap_mode='r')

by which the memmap array were no longer used to take place of memory array, it went much faster. 通过这种方式,memmap阵列不再用于代替内存阵列,它的运行速度要快得多。 I don't know why up to now. 我不知道为什么到现在为止。 Is there a memmap lock or something? 有memmap锁或其他东西吗?

Then, I read this article Parallel and HPC with Python (or numpy) and decided to turn to mpi4py . 然后,我阅读了这篇使用Python(或numpy)的Parallel和HPC的文章并决定转向mpi4py After hours of struggling with debugging, I got satisfying results. 经过数小时的调试工作,我得到了令人满意的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算大协方差矩阵的特征值的最有效方法是什么? - What is the most efficient way to calculate the eigen values of a large covariance matrix? 获得大型稀疏矩阵的每行最大值的有效方法 - efficient way to get the max of each row for large sparse matrix 求和所有可能对的有效方法 - Efficient way to sum all possible pairs 计算大型数组平均值的最有效方法? - Most efficient way to calculate mean of a large array? 计算矩阵中每一行的最大值(不包括其自己的列)的更有效方法是什么? - What's a more efficient way to calculate the max of each row in a matrix excluding its own column? 在 python 中计算相关矩阵的最有效方法 - Most efficient way to calculate correlation matrix in python 在python中计算矩阵中值距离的有效方法 - Efficient way to calculate distance to value in matrix in python 任何有效的方法来更新非常大的矩阵? - Any efficient way to update very large matrix? 在不使用嵌套循环的情况下查找列表中所有对的有效方法 - Efficient way to find all the pairs in a list without using nested loop 计算两个列表的所有IoU的有效方法 - Efficient way to calculate all IoUs of two lists
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM