简体   繁体   中英

How I use global memory correctly in CUDA?

I'm trying to do an application in CUDA which uses global memory defined with device . This variables are declared in a.cuh file.

In another file.cu is my main in which I do the cudaMallocs and the cudaMemCpy.

That's a part of my code:

cudaMalloc((void**)&varOne,*tam_varOne * sizeof(cuComplex));
cudaMemcpy(varOne,C_varOne,*tam_varOne * sizeof(cuComplex),cudaMemcpyHostToDevice);

varOne is declared in the.cuh file like this:

    __device__ cuComplex *varOne;

When I launch my kernel (I'm not passing varOne as parameter) and try to read varOne with the debugger, it says that can't read the variable. The pointer address it 000..0 so it's obviously that it is wrong.

So, how I have to declare and copy the global memory in CUDA?

First, you need to declare the pointers to the data that will be copied from the CPU to the GPU. In the example above, we want to copy the array original_cpu_array to CUDA global memory.

int original_cpu_array[array_size];   
int *array_cuda;

Calculate the memory size that the data will occupy.

int size = array_size * sizeof(int);

Cuda memory allocation:

msg_erro[0] = cudaMalloc((void **)&array_cuda,size);

Copying from CPU to GPU:

msg_erro[0] = cudaMemcpy(array_cuda, original_cpu_array,size,cudaMemcpyHostToDevice);

Execute kernel

Copying from GPU to CPU:

msg_erro[0] = cudaMemcpy(original_cpu_array,array_cuda,size,cudaMemcpyDeviceToHost);

Free Memory:

cudaFree(array_cuda);

For debugging reasons, typically, I save the status of the functions calls in an array. ( eg, cudaError_t msg_erro[var]; ). This is not strictly necessary, but it will save you some time if an error occurs during the allocation and memory transferences.

And if errors do occur, I print them using a function like:

void printErros(cudaError_t *erros,int size, int flag)
{
 for(int i = 0; i < size; i++)
     if(erros[i] != 0)
     {
         if(flag == 0) printf("Alocacao de memoria");
         if(flag == 1) printf("CPU -> GPU  ");
         if(flag == 2) printf("GPU -> CPU  ");
         printf("{%d} => %s\n",i ,cudaGetErrorString(erros[i]));
     }
}

The flag is primarily to indicate the part in the code that the error occurred. For instance, after a memory allocation:

msg_erro[0] = cudaMalloc((void **)&array_cuda,size);
printErros(msg_erro,msg_erro_size, 0);

I have experimented with some example and found that, you cannot directly use the global variable in the kernel without passing to it. Even though you initialize in.cuh file, you need to initialize in the main().

Reason:

  1. If you declare it globally, the Memory is not allocated in the GPU Global Memory. You need to use cudaMalloc((void**)&varOne,sizeof(cuComplex)) for the allocation of memory. It can only allocate memory on GPU. The declaration __device__ cuComplex *varOne; works just as a prototype and variable declaration. But, the memory is not allocated until cudaMalloc((void**)&varOne,sizeof(cuComplex)) is used.
  2. Also, you need to initialize the *varOne in main() as a Host pointer initially. After using cudaMalloc() , it comes to know that the pointer is Device Pointer.

The sequence of steps are: (for my tested code)

int *Ad;        //If you can allocate this in .cuh file, you dont need the shown code in main()

__global__ void Kernel(int *Ad){
....
}

int main(){
....
      int size=100*sizeof(int);
      cudaMalloc((void**)&Ad,size);
      cudaMemcpy(Ad,A,size,cudaMemcpyHostToDevice);
....
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM