CUDA threads and blocks explanation

Question

I have been following a tutorial here http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf

trying to teach myself basic GPU programming. I still don't quite understand the topology of blocks and threads. On page 42 the code defines size data as follows:

#define N (2048*2048)
#define THREADS_PER_BLOCK 512

Is this tutorial making assumptions? I'm currently on a laptop with a Nvidia 520m GPU. using the structure cudaDeviceProp I was able to determine that I am capible of running 1024 threads per block. What exactly does the 2048x2048 quantify? The number of blocks? how do I know if that is correct?

Answer 1

The N (2048*2048) quantity is the overall size of the data set. This problem is a vector add problem, so the overall size of the vectors to be added is N elements.

Threads per block is already defined at 512.

The number of blocks can be determined from the kernel launch:

add<<<N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_a, d_b, d_c);
       ^                          ^  
    number of blocks            number of threads in each block

So the total number of blocks in the grid being launched is 2048*2048/512 = 8192

These particular parameters (512 threads per block, 8192 blocks total) should be compatible with any currently available CUDA GPU.

CUDA threads and blocks explanation

Question

1 answers

solution1
1 2015-01-18 00:52:55

CUDA threads and blocks explanation

Question

1 answers

solution1 1 2015-01-18 00:52:55

solution1
1 2015-01-18 00:52:55