I am trying to Implement Parallel Reduction using Min, Max, Sum and Average in CUDA.
This is my main code snippet as of now.
int main()
{
const auto count = 8;
const int size = count * sizeof(int);
int h[] = {13, 27, 15, 14, 33, 2, 24, 6};
int* d;
int choice = 0;
do{
cout <<"\n ---MENU--- \n";
cout <<"1. Find Sum of Numbers in Array\n";
cout <<"2. Find Max of Array\n";
cout <<"3. Find Min of Array\n";
cout <<"4. Find Average of Array\n";
cout <<"5. Exit\n";
cout <<"Enter your Choice : ";
cin >> choice;
switch(choice){
case 1:
cudaMalloc(&d, size);
cudaMemcpy(d, h, size, cudaMemcpyHostToDevice);
sum <<<1, count / 2 >>>(d);
int result;
cudaMemcpy(&result, d, sizeof(int), cudaMemcpyDeviceToHost);
cout << "Sum is " << result << endl;
getchar();
cudaFree(d);
delete[] h;
break;
case 5:
break;
default:
cout<<"Wrong Input!! Try Again!";
break;
}
}while(choice != 5);
return 0;
}
This is my CUDA Kernel for SUM:
__global__ void sum(int* input)
{
const int tid = threadIdx.x;
auto step_size = 1;
int number_of_threads = blockDim.x;
while (number_of_threads > 0)
{
if (tid < number_of_threads) // still alive?
{
const auto fst = tid * step_size * 2;
const auto snd = fst + step_size;
input[fst] += input[snd];
}
step_size <<= 1;
number_of_threads >>= 1;
}
}
On running the program, I am getting this as OUTPUT:
How do I solve this issue? Please help me.
Don't ignore the compiler warnings. You are calling delete[]
on a non-dynamically-allocated array. This is undefined behavior and likely the cause of your core dump.
You don't need to call delete[]
for arrays on the stack.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.