简体   繁体   English

CUDA线程和块说明

[英]CUDA threads and blocks explanation

I have been following a tutorial here http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf 我在这里一直在关注教程http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf

trying to teach myself basic GPU programming. 尝试自学基本的GPU编程。 I still don't quite understand the topology of blocks and threads. 我仍然不太了解块和线程的拓扑。 On page 42 the code defines size data as follows: 在第42页上,代码定义尺寸数据如下:

#define N (2048*2048)
#define THREADS_PER_BLOCK 512

Is this tutorial making assumptions? 本教程是否在做假设? I'm currently on a laptop with a Nvidia 520m GPU. 我目前正在使用配备Nvidia 520m GPU的笔记本电脑。 using the structure cudaDeviceProp I was able to determine that I am capible of running 1024 threads per block. 使用cudaDeviceProp结构,我能够确定我能够在每个块上运行1024个线程。 What exactly does the 2048x2048 quantify? 2048x2048确切地量化了什么? The number of blocks? 多少块? how do I know if that is correct? 我怎么知道那是正确的?

The N (2048*2048) quantity is the overall size of the data set. N (2048 * 2048)数量是数据集的整体大小。 This problem is a vector add problem, so the overall size of the vectors to be added is N elements. 该问题是向量相加问题,因此要相加的向量的总大小为N元素。

Threads per block is already defined at 512. 每个块的线程数已在512定义。

The number of blocks can be determined from the kernel launch: 块的数量可以从内核启动中确定:

add<<<N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_a, d_b, d_c);
       ^                          ^  
    number of blocks            number of threads in each block

So the total number of blocks in the grid being launched is 2048*2048/512 = 8192 因此,正在启动的网格中的块总数为2048 * 2048/512 = 8192

These particular parameters (512 threads per block, 8192 blocks total) should be compatible with any currently available CUDA GPU. 这些特定的参数(每块512个线程,总共8192个块)应该与任何当前可用的CUDA GPU兼容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM