简体   繁体   English

矩阵乘法CUDA

[英]Matrix Multiplication CUDA

I have been reading through several websites and even used NVIDA's code as a guide but I am still getting the wrong answer. 我一直在阅读几个网站,甚至使用NVIDA的代码作为指南,但我仍然得到了错误的答案。 The main will ask the user for size, and will display A and B then display the resulting matrix C. However say I run a 2x2 matrix for both A and B this is my sample output: 主要询问用户的大小,并显示A和B然后显示结果矩阵C.但是说我为A和B运行2x2矩阵这是我的样本输出:

Matrix A
0.000000 8.000000
2.000000 2.000000


Matrix B
3.000000 1.000000
5.000000 7.000000


Matrix C (Results)
0.000000 9.000000
7.000000 4.000000

But that's incorrect. 但这是不正确的。 It should be: 它应该是:

40.000 56.000
16.000 16.000

I changed it from decimals to whole numbers so that it would be easier to check, and I found that it's incorrect. 我将它从小数改为整数,以便更容易检查,我发现它是不正确的。 I do not understand why it would be incorrect, especially even though I took it right from their code sample. 我不明白为什么它会不正确,特别是即使我从他们的代码示例中采取了它。

#ifndef _MATRIXMUL_KERNEL_H_
#define _MATRIXMUL_KERNEL_H_

#include <stdio.h>

// Thread block size
#define BLOCK_SIZE 16
#define TILE_SIZE  16



// CUDA Kernel
__global__ void matrixMul( float* C, float* A, float* B, int wA, int wB)
{
    // Block index
    int bx = blockIdx.x;
    int by = blockIdx.y;

// Thread index
int tx = threadIdx.x;
int ty = threadIdx.y;

// Index of the first sub-matrix of A processed 
// by the block
int aBegin = wA * BLOCK_SIZE * by;

// Index of the last sub-matrix of A processed 
// by the block
int aEnd   = aBegin + wA - 1;

// Step size used to iterate through the 
// sub-matrices of A
int aStep  = BLOCK_SIZE;

// Index of the first sub-matrix of B processed 
// by the block
int bBegin = BLOCK_SIZE * bx;

// Step size used to iterate through the 
// sub-matrices of B
int bStep  = BLOCK_SIZE * wB;
float Csub=0;
// Loop over all the sub-matrices of A and B
// required to compute the block sub-matrix
for (int a = aBegin, b = bBegin; a <= aEnd; a += aStep, b += bStep) 
{
    // Declaration of the shared memory array As 
    // used to store the sub-matrix of A
    __shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

    // Declaration of the shared memory array Bs 
    // used to store the sub-matrix of B
    __shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];

    // Load the matrices from global memory
    // to shared memory; each thread loads
    // one element of each matrix
    As[ty][tx] = A[a + wA * ty + tx];
    Bs[ty][tx] = B[b + wB * ty + tx];

    // Synchronize to make sure the matrices 
    // are loaded
    __syncthreads();

    // Multiply the two matrices together;
    // each thread computes one element
    // of the block sub-matrix
    for (int k = 0; k < BLOCK_SIZE; ++k)
        Csub += As[ty][k] * Bs[k][tx];

    // Synchronize to make sure that the preceding
    // computation is done before loading two new
    // sub-matrices of A and B in the next iteration
    __syncthreads();
}
// Write the block sub-matrix to device memory;
// each thread writes one element
int c = wB * BLOCK_SIZE * by + BLOCK_SIZE * bx;
C[c + wB * ty + tx] = Csub;
}

#endif // #ifndef _MATRIXMUL_KERNEL_H_

host code: 主机代码:

    //perform the calculation
    //setup execution parameters
    dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
    dim3 grid(c.colSize / threads.x, c.rowSize / threads.y);

    //   execute the kernel
    matrixMul<<< grid, threads >>>(deviceMatrixC, deviceMatrixA, deviceMatrixB, a.colSize, b.colSize);

Thanks for your help, Dan 谢谢你的帮助,丹

The code you are using implicitly requires that the size of the matrices are round multiples of the block size (16x16 in this case). 您隐式使用的代码要求矩阵的大小是块大小的四倍(在这种情况下为16x16)。 The inner product calculation processes a tile width at a time without checking for out of bounds memory access. 内积计算一次处理一个区块宽度,而不检查超出范围内存访问。 For this reason, 2x2 matrices will not work. 因此,2x2矩阵将无法正常工作。

If you try running kernel with a 16x16 input (for example zero padding your 2x2 case to 16x16), you should be able to confirm the result. 如果您尝试使用16x16输入运行内核(例如将2x2大小写填充为16x16的零填充),则应该能够确认结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM