CUDA线程和块说明

Question

I have been following a tutorial here http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf 我在这里一直在关注教程http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf

trying to teach myself basic GPU programming. 尝试自学基本的GPU编程。 I still don't quite understand the topology of blocks and threads. 我仍然不太了解块和线程的拓扑。 On page 42 the code defines size data as follows: 在第42页上，代码定义尺寸数据如下：

#define N (2048*2048)
#define THREADS_PER_BLOCK 512

Is this tutorial making assumptions? 本教程是否在做假设？ I'm currently on a laptop with a Nvidia 520m GPU. 我目前正在使用配备Nvidia 520m GPU的笔记本电脑。 using the structure cudaDeviceProp I was able to determine that I am capible of running 1024 threads per block. 使用cudaDeviceProp结构，我能够确定我能够在每个块上运行1024个线程。 What exactly does the 2048x2048 quantify? 2048x2048确切地量化了什么？ The number of blocks? 多少块？ how do I know if that is correct? 我怎么知道那是正确的？

Answer 1

The N (2048*2048) quantity is the overall size of the data set. N （2048 * 2048）数量是数据集的整体大小。 This problem is a vector add problem, so the overall size of the vectors to be added is N elements. 该问题是向量相加问题，因此要相加的向量的总大小为N元素。

Threads per block is already defined at 512. 每个块的线程数已在512定义。

The number of blocks can be determined from the kernel launch: 块的数量可以从内核启动中确定：

add<<<N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_a, d_b, d_c);
       ^                          ^  
    number of blocks            number of threads in each block

So the total number of blocks in the grid being launched is 2048*2048/512 = 8192 因此，正在启动的网格中的块总数为2048 * 2048/512 = 8192

These particular parameters (512 threads per block, 8192 blocks total) should be compatible with any currently available CUDA GPU. 这些特定的参数（每块512个线程，总共8192个块）应该与任何当前可用的CUDA GPU兼容。

CUDA线程和块说明

问题描述

1 个解决方案

解决方案1
1 2015-01-18 00:52:55

CUDA线程和块说明

问题描述

1 个解决方案

解决方案1 1 2015-01-18 00:52:55

解决方案1
1 2015-01-18 00:52:55