简体   繁体   中英

CUDA threads and blocks explanation

I have been following a tutorial here http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf

trying to teach myself basic GPU programming. I still don't quite understand the topology of blocks and threads. On page 42 the code defines size data as follows:

#define N (2048*2048)
#define THREADS_PER_BLOCK 512

Is this tutorial making assumptions? I'm currently on a laptop with a Nvidia 520m GPU. using the structure cudaDeviceProp I was able to determine that I am capible of running 1024 threads per block. What exactly does the 2048x2048 quantify? The number of blocks? how do I know if that is correct?

The N (2048*2048) quantity is the overall size of the data set. This problem is a vector add problem, so the overall size of the vectors to be added is N elements.

Threads per block is already defined at 512.

The number of blocks can be determined from the kernel launch:

add<<<N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_a, d_b, d_c);
       ^                          ^  
    number of blocks            number of threads in each block

So the total number of blocks in the grid being launched is 2048*2048/512 = 8192

These particular parameters (512 threads per block, 8192 blocks total) should be compatible with any currently available CUDA GPU.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM