简体   繁体   English

基本/简单公式,用于计算CUDA内核中所需的块数

[英]basic/simple formula to compute number of blocks needed in cuda kernel

i was having some difficulty with getting the correct number of blocks per grid in CUDA. 我在CUDA中难以获得每个网格的正确块数。 can anyone show basic/simple formula to compute number of blocks needed in CUDA kernel for 2D one? 谁能显示基本/简单公式来计算2D CUDA内核所需的块数? (ie gridDim.x and gridDim.y) given the fact that user wants to run N total number of threads, and his blocks are A by B (where A*B<=512 or 1024 depending on compute capability), or for simple case let's assume his blocks are 8 by 8. thanks again. (例如gridDim.x和gridDim.y),因为用户要运行N个线程总数,并且他的块是A×B(其中A * B <= 512或1024,具体取决于计算能力),或者简单假设我们的盖帽是8乘8。再次感谢。 also can you point which things we have to keep in my mind, for example does it really matter whether they are powers of two or not... 您还可以指出我们必须记住哪些事情,例如,它们是否是二的幂真的很重要...

 dim3 dimBlock(A,B);
 dim3 dimGrid(Z,T);

i am looking for Z and T. thanks!!! 我正在寻找Z和T。谢谢!!!

The total number of threads N is calculated by 线程总数N由下式计算

N = No_blocks * No_threads_per_block

So you have A*B threads per block, so you should have Z*T=N/(A*B) . 因此,每个块有A*B线程,因此应该有Z*T=N/(A*B)

Z and T should be integers of course, but don't need to be powers of 2. Also, depends on your CUDA compute capability, there are limitations on the number of blocks in each dimension of the grid. ZT当然应该是整数,但不必是2的幂。而且,取决于您的CUDA计算能力,在网格的每个维度上的块数都有限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM