[英]CUDA double matrix overflow
I wrote a program that double the element for a given matrix, if I change the matrix size to be 500, it will "stopped working" due to overflow, can people help me understand why? 我编写了一个程序,将给定矩阵的元素加倍,如果将矩阵大小更改为500,由于溢出它会“停止工作”,人们可以帮助我理解为什么吗? (it works fine for 100) (可以正常工作100个)
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdlib.h>
__global__ void kernel_double(int *c, int *a)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
c[i] = a[i] * 2;
}
int main()
{
const int size = 100;
// failed when size = 500, Unhandled exception at 0x00123979 in
// doublify.exe: 0xC00000FD:
// Stack overflow (parameters: 0x00000000, 0x00602000).
int a[size][size], c[size][size];
int sum_a = 0;
int sum_c = 0;
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
a[i][j] = rand() % 10;
sum_a += a[i][j];
}
}
printf("sum of matrix a is %d \n", sum_a);
int *dev_a = 0;
int *dev_c = 0;
cudaMalloc((void**)&dev_c, size * size * sizeof(int));
cudaMalloc((void**)&dev_a, size * size * sizeof(int));
cudaMemcpy(dev_a, a, size * size * sizeof(int), cudaMemcpyHostToDevice);
printf("grid size %d \n", int(size * size / 1024) + 1);
kernel_double << <int(size * size / 1024) + 1, 1024 >> >(dev_c, dev_a);
cudaDeviceSynchronize();
cudaMemcpy(c, dev_c, size * size * sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(dev_c);
cudaFree(dev_a);
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
sum_c += c[i][j];
}
}
printf("sum of matrix c is %d \n", sum_c);
return 0;
}
And here is the output when size equals to 100: 这是size等于100时的输出:
sum of matrix a is 44949
grid size 10
sum of matrix c is 89898
Press any key to continue . . .
My development environment is MSVS2015 V14, CUDA8.0 and GTX1050Ti 我的开发环境是MSVS2015 V14,CUDA8.0和GTX1050Ti
You're getting a stack overflow with a size of 500 because you declare 2 local variable arrays with 250,000 elements each. 由于声明了2个局部变量数组(每个数组包含250,000个元素),因此出现了大小为500的堆栈溢出。 This works out to about 2MB of stack space. 这相当于大约2MB的堆栈空间。
You may be able to supply a linker option to increase the initial stack size, but a better solution would be dynamically allocate the space for your arrays. 您也许可以提供一个链接器选项来增加初始堆栈的大小,但是更好的解决方案是为阵列动态分配空间。 (You could create a class with the arrays in them, then just allocate an instance of that class.) (您可以创建一个包含数组的类,然后只分配该类的实例。)
For example, before your main
function add a new struct: 例如,在您的main
函数之前添加一个新结构:
struct mats {
int a[size][size];
int c[size][size];
};
Then, in your main
, remove the a
and c
arrays, and replace it with 然后,在您的main
,删除a
和c
数组,并将其替换为
auto ary = std::make_unique<mats>();
everywhere you reference a
or c
, use ary->a
and ary->c
instead. 在引用a
或c
任何地方,请改用ary->a
和ary->c
。 (The unique_ptr will automatically delete the allocated memory when ary
goes out of scope.) (当ary
超出范围时,unique_ptr将自动删除分配的内存。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.