简体   繁体   English

了解CUDA中的矩阵乘法

[英]understanding Matrix multiplication in CUDA

I am trying to learn CUDA. 我正在尝试学习CUDA。 I started to try matrix multiplication with the help of this article based on GPU. 我开始尝试的帮助下矩阵乘法文章基于GPU。 My main problem is that I am unable too understand how to access 2D array in Kernel since accessing a 2D array is a bit different than the conventional method (matrix[i][j]). 我的主要问题是我不太了解如何在内核中访问2D数组,因为访问2D数组与常规方法(matrix [i] [j])有点不同。 This is the part where i am stuck: 这是我卡住的部分:

for (int i = 0; i < N; i++) {
    tmpSum += A[ROW * N + i] * B[i * N + COL];
}
C[ROW * N + COL] = tmpSum;

I could understand how ROW and COLUMN were derived. 我可以理解ROW和COLUMN是如何派生的。

int ROW = blockIdx.y*blockDim.y+threadIdx.y;
int COL = blockIdx.x*blockDim.x+threadIdx.x;

Any explanation with an example is highly appreciated. 高度赞赏带有示例的任何解释。 Thanks! 谢谢!

Matrices are stored contiguously, ie every row after the other at consecutive locations. 矩阵是连续存储的,即在连续位置的每一行之后。 What you see here is called flat adressing, ie turning the two element index to an offset from the first element. 您在此处看到的称为平面地址,即将两个元素的索引旋转为与第一个元素的偏移。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM