简体   繁体   English

GTX TITAN上的最大块数

[英]Maximum blocks number on a GTX TITAN

I'm trying to compute Fourier transforms using CUDA on a nvidia GTX TITAN graphic card. 我正在尝试在nvidia GTX TITAN图形卡上使用CUDA计算傅立叶变换。 I have a problem when reaching a certain number of blocks of my card. 到达卡的一定数量的块时,我遇到了问题。

Here is what my card tells me when using cudaGetDeviceProperties: 这是使用cudaGetDeviceProperties时我的卡片告诉我的内容:

  • maxThreadsPerBlock: 1024 maxThreadsPerBlock:1024
  • maxThreadsDim: 1024 x 1024 x 64 maxThreadsDim:1024 x 1024 x 64
  • maxGridSize: 2147483647 x 65535 x 65535 maxGridSize:2147483647 x 65535 x 65535

Here is the code I use to call my kernel function: 这是我用来调用内核函数的代码:

cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);

unsigned int threads = prop.maxThreadsPerBlock;
unsigned int max_blocks = prop.maxGridSize[0];
unsigned int blocks = (pixel_size + threads - 1) / threads;

// Hardware limit
if (blocks > max_blocks)
  blocks = max_blocks;

kernel_function <<<blocks, threads>>>(pixel_size);

And the kernel code: 和内核代码:

__global__ void kernel_function(unsigned int pixel_size)
{
  unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;

  while (index < pixel_size)
  {
    // Treatment here
    index += blockDim.x * gridDim.x;
  }
}

Where pixel_size is the size in pixels of an image block I want to do transforms on. 其中pixel_size是要转换的图像块的像素大小。

So threads is always equal to 1024, which is what I want. 所以threads总是等于1024,这就是我想要的。 Whenever blocks are inferior or equals to 65535, then my code works fine. 每当blocks的劣等或等于65535时,我的代码就可以正常工作。 But when blocks reaches a value above 65535, the results I have are a nonsense and totally random. 但是,当blocks达到65535以上的值时,我得到的结果是毫无意义的,而且是完全随机的。 So what is the maximum number of blocks I can have in a one dimension problem ? 那么在一维问题中我最多可以容纳多少块? I assumed in the previous code that it was 2147483647 ? 我在前面的代码中假设它是2147483647? What am I doing wrong ? 我究竟做错了什么 ?

I feel like I am using the wrong hardware limit for my number of blocks because when I set it to 65535, this code is working fine. 我感觉我为块数使用了错误的硬件限制,因为当我将其设置为65535时,此代码可以正常工作。

Thank you in advance for your answers. 预先感谢您的回答。

Problem solved, I was compiling with flags for 2.x architecture instead of 3.5 so the 2.x limit was applying (wich is 65535 blocks max on x dimension). 问题解决了,我使用2.x体系结构的标志(而不是3.5)进行编译,因此应用了2.x限制(在x维度上最大为65535块)。 After compiling with compute_35, sm_35, it worked. 在使用compute_35,sm_35进行编译后,它开始工作。

Thanks @talonmies. 谢谢@talonmies。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM