[英]Maximum blocks number on a GTX TITAN
I'm trying to compute Fourier transforms using CUDA on a nvidia GTX TITAN graphic card. 我正在尝试在nvidia GTX TITAN图形卡上使用CUDA计算傅立叶变换。 I have a problem when reaching a certain number of blocks of my card. 到达卡的一定数量的块时,我遇到了问题。
Here is what my card tells me when using cudaGetDeviceProperties: 这是使用cudaGetDeviceProperties时我的卡片告诉我的内容:
Here is the code I use to call my kernel function: 这是我用来调用内核函数的代码:
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
unsigned int threads = prop.maxThreadsPerBlock;
unsigned int max_blocks = prop.maxGridSize[0];
unsigned int blocks = (pixel_size + threads - 1) / threads;
// Hardware limit
if (blocks > max_blocks)
blocks = max_blocks;
kernel_function <<<blocks, threads>>>(pixel_size);
And the kernel code: 和内核代码:
__global__ void kernel_function(unsigned int pixel_size)
{
unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;
while (index < pixel_size)
{
// Treatment here
index += blockDim.x * gridDim.x;
}
}
Where pixel_size is the size in pixels of an image block I want to do transforms on. 其中pixel_size是要转换的图像块的像素大小。
So threads
is always equal to 1024, which is what I want. 所以threads
总是等于1024,这就是我想要的。 Whenever blocks
are inferior or equals to 65535, then my code works fine. 每当blocks
的劣等或等于65535时,我的代码就可以正常工作。 But when blocks
reaches a value above 65535, the results I have are a nonsense and totally random. 但是,当blocks
达到65535以上的值时,我得到的结果是毫无意义的,而且是完全随机的。 So what is the maximum number of blocks I can have in a one dimension problem ? 那么在一维问题中我最多可以容纳多少块? I assumed in the previous code that it was 2147483647 ? 我在前面的代码中假设它是2147483647? What am I doing wrong ? 我究竟做错了什么 ?
I feel like I am using the wrong hardware limit for my number of blocks because when I set it to 65535, this code is working fine. 我感觉我为块数使用了错误的硬件限制,因为当我将其设置为65535时,此代码可以正常工作。
Thank you in advance for your answers. 预先感谢您的回答。
Problem solved, I was compiling with flags for 2.x architecture instead of 3.5 so the 2.x limit was applying (wich is 65535 blocks max on x dimension). 问题解决了,我使用2.x体系结构的标志(而不是3.5)进行编译,因此应用了2.x限制(在x维度上最大为65535块)。 After compiling with compute_35, sm_35, it worked. 在使用compute_35,sm_35进行编译后,它开始工作。
Thanks @talonmies. 谢谢@talonmies。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.