简体   繁体   English

cuda内核的配置参数

[英]configuration parameters of cuda kernel

I have to add two square matrices of N x N using cuda program . 我必须使用cuda程序添加两个N x N平方矩阵。 The book asks to write the configuration parameters for the kernel for the cases : 本书要求为案例编写内核的配置参数:

(a) Each thread must process only 1 matrix element (a)每个线程必须只处理1矩阵元素

(b) Each thread producing one output matrix row (b)每个线程产生一个输出矩阵行

(c) Each thread producing one output matrix column (c)每个线程产生一个输出矩阵列

My solutions for the above : 我对上述方案的解决方案:

(a) (一种)

dim3 threadPerBlocks(1,1,1);
dim3 numBlocks(N,N,1);

(b) (b)中

dim3 threadPerBlocks(N,1,1);
dim3 numBlocks(1,N,1);

(c) (C)

dim3 threadPerBlocks(1,N,1);
dim3 numBlocks(N,1,1);

I have no idea whether I am right or wrong for parts (b) and (c) . 我不知道(b)和(c)部分我是对还是错。 Please tell me about those and give a brief explanation about them ( if they are wrong , please correct me and explain ) . 请告诉我这些并给出一个简短的解释(如果他们错了,请纠正我并解释)。

(a) is somewhat fine but you can write in different ways.. All its required is you need to have N x N threads so each processes one element. (a)有点好,但你可以用不同的方式编写。所有需要的是你需要有N x N线程,所以每个处理一个元素。

Alternative for (a) is (a)的替代方案是

dim3 threadPerBlocks(N,1,1);
dim3 numBlocks(N,1,1);

And in kernel you process as 在内核中你处理为

id = blockIdx.x * blockDim.x + threadIdx.x ;

array[id] = ... ; // process one element.

But for (b) it says you need to each thread producing one out matrix row so you need only N or number of columns number of threads. 但是对于(b)它说你需要每个线程产生一个矩阵行,所以你只需要Nnumber of columns数的线程数。 What you have written with that you will still end up with N x N threads. 你写的是你仍然会得到N x N线程。

So you can write this way. 所以你可以这样写。 One of the possible way there are other ways too. 其中一种可能的方式也有其他方式。

dim3 threadPerBlocks(N,1,1);
dim3 numBlocks(1,1,1);

idx = threadIdx.x ; 

Then you use a for loop to process 1 row in each thread. 然后使用for loop在每个线程中处理1行。

for (i = 0 ; i < N ; i++)
{
    index = idx * N + i ;
    array [index] = ..... ;   
}

Similarly you can think for (c) case. 同样,你可以考虑(c)案例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM