I build a simple cuda kernel that performs a sum on elements. Each thread adds an input value to an output buffer. Each thread calculates one value. 2432 threads are being used (19 blocks * 128 threads).
The output buffer remains the same, the input buffer pointer is shifted by threadcount after each kernel execution. So in total, we have a loop invoking the add kernel until we computed all input data.
Example: All my input values are set to 1. The output buffer size is 2432. The input buffer size is 2432 *2000. 2000 times the add kernel is called to add 1 to each field of output. The endresult in output is 2000 at every field. I call the function aggregate which contains a for loop, calling the kernel as often as needed to pass over the complete input data. This works so far unless I call the kernel too often.
However if I call the Kernel 2500 times, I get an illegalmemoryaccess cuda error.
As you can see, the runtime of the last successfull kernel increases by 3 orders of magnitude. Afterwards my pointers are invalidated and the following invocations result in CudaErrorIllegalAdress.
I cleaned up the code to get a minimal working example:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <vector>
#include <stdio.h>
#include <iostream>
using namespace std;
template <class T> __global__ void addKernel_2432(int *in, int * out)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
out[i] = out[i] + in[i];
}
static int aggregate(int* array, size_t size, int* out) {
size_t const vectorCount = size / 2432;
cout << "ITERATIONS: " << vectorCount << endl;
for (size_t i = 0; i < vectorCount-1; i++)
{
addKernel_2432<int><<<19,128>>>(array, out);
array += vectorCount;
}
addKernel_2432<int> << <19, 128 >> > (array, out);
return 1;
}
int main()
{
int* dev_in1 = 0;
size_t vectorCount = 2432;
int * dev_out = 0;
size_t datacount = 2432*2500;
std::vector<int> hostvec(datacount);
//create input buffer, filled with 1
std::fill(hostvec.begin(), hostvec.end(), 1);
//allocate input buffer and output buffer
cudaMalloc(&dev_in1, datacount*sizeof(int));
cudaMalloc(&dev_out, vectorCount * sizeof(int));
//set output buffer to 0
cudaMemset(dev_out, 0, vectorCount * sizeof(int));
//copy input buffer to GPU
cudaMemcpy(dev_in1, hostvec.data(), datacount * sizeof(int), cudaMemcpyHostToDevice);
//call kernel datacount / vectorcount times
aggregate(dev_in1, datacount, dev_out);
//return data to check for corectness
cudaMemcpy(hostvec.data(), dev_out, vectorCount*sizeof(int), cudaMemcpyDeviceToHost);
if (cudaSuccess != cudaMemcpy(hostvec.data(), dev_out, vectorCount * sizeof(int), cudaMemcpyDeviceToHost))
{
cudaError err = cudaGetLastError();
cout << " CUDA ERROR: " << cudaGetErrorString(err) << endl;
}
else
{
cout << "NO CUDA ERROR" << endl;
cout << "RETURNED SUM DATA" << endl;
for (int i = 0; i < 2432; i++)
{
cout << hostvec[i] << " ";
}
}
cudaDeviceReset();
return 0;
}
If you compile and run it, you get an error. Change:
size_t datacount = 2432 * 2500;
to
size_t datacount = 2432 * 2400;
and it gives the correct results.
I am looking for any ideas, why it breaks after 2432 kernel invocations.
What i have found so far googeling around: Wrong target architecture set. I use a 1070ti. My target is set to: compute_61,sm_61 In visual studio project properties. That does not change anything.
Did I miss something? Is there a limit how many times a kernel can be called until cuda invalidates pointer? Thank you for your help. I used windows, Visual Studio 2019 and CUDA runtime 11.
This is the output in both cases. Succes and failure:
[
Error: [
static int aggregate(int* array, size_t size, int* out) {
size_t const vectorCount = size / 2432;
for (size_t i = 0; i < vectorCount-1; i++)
{
array += vectorCount;
}
}
That's not vectorCount
but the number of iterations you have been accidentally incrementing by. Works fine while vectorCount <= 2432
(but yields wrong results), and results in buffer overflow above.
array += 2432
is what you intended to write.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.