简体   繁体   English

使用MPI的矩阵乘法

[英]Matrix multiplication with mpi

I am new to Parallel Programming. 我是并行编程的新手。 I am trying to multiply two matrices. 我正在尝试将两个矩阵相乘。 I have partitioned the problem as follows: 我对问题进行了如下划分:

Let the operation be mat3 = mat1 x mat2 I am broadcasting the mat2 to all the processes in the communicator, and cutting out strips of rows of the mat1 and scattering them to the processes in the commnicator group. 假设操作为mat3 = mat1 x mat2我正在将mat2广播到通信器中的所有进程,并切掉mat1的行带并将它们散布到通信器组中的进程。 After all processes has the entire mat2 and the corresponding strips of mat1, they multiply the strip with mat2 and then I am using the gather operation with the local results of the process, and accumulate the entire result in the root process. 在所有进程都具有整个mat2和相应的mat1条带之后,他们将条带与mat2相乘,然后将聚集操作与进程的本地结果一起使用,并将整个结果累加到根进程中。

I wanted to know if there is a better problem partitioning to multiply two matrix in a general purpose computer. 我想知道在通用计算机中将两个矩阵相乘是否存在更好的分区问题。

My implementation is in OpenMPI. 我的实现在OpenMPI中。

There is a variety of algorithms in the literature about Matrix Multiplication that can be extend to the MPI paradigm. 关于矩阵乘法的文献中有多种算法可以扩展到MPI范例。 For example: 例如:

> 1Dsystolic [1] 
> 2D-systolic, Cannon’s algorithm [2];
> Fox’s algorithm [3];
> Berntsen’s algorithm [4];
> DNS algorithm [5].

If you ignore the matrix proprieties (sparse ect), it basically resumes on how the data its distributed among the processes to minimize the synchronization and the load unbalance (the amount of work distributed among each process). 如果忽略矩阵属性(稀疏),则基本上会恢复数据如何在进程之间分配,以最大程度地减少同步和负载不平衡(每个进程之间分配的工作量)。

In this recent work you can see two different data distribution approach and the comparison between them. 最近的工作中,您可以看到两种不同的数据分发方法以及它们之间的比较。

papers: 文件:

[1] Golub G.H and Van C.H L., “Matrix Computations.”,Johns Hopkins University Press, 1989.
[2] Whaley R. C., Petitet A., Dongarra J. J., “Automated empirical optimizations of software and the ATLAS project” Parallel Computing 27, 1.2 (2001), 3.35.
[3] Fox G. C., Otto S. W., and Hey A. J. G., “Matrix algorithms on a hypercube I:
Matrix multiplication”,Parallel Computing, vol. 4, pp. 17-31. 1987.
[4] Berntsen J.,“Communication efficient matrix multiplication on hypercubes, Parallel Computing”, vol. 12, pp. 335-342, 1989.
[5] Ranka S. and Sahni S., “Hypercube Algorithms for Image Processing and Pattern Recognition”, Springer- Verlag, New York, NY, 1990.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM