2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

Question

CUDA documentation recommends the use of cudaMemCpy2D() for 2D arrays (and similarly cudaMemCpy3D() for 3D arrays) instead of cudaMemCpy() for better performance as the former allocates device memory more appropriately. On the other hand, all cudaMemCpy functions, just like memcpy() , require contiguous allocation of memory.

This is all fine if I create my (host) array as, for example, float myArray[h][w]; . However, it most likely will not work if I use something like:

float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = new float[w];
}

This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use cudaMemCpy() and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way.

Is there a better way to handle this situation? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use cudaMemCpy2D() ?

PS: I couldn't find the answer to this question the following previous similar posts:

Allocate 2D array with cudaMallocPitch and copying with cudaMemcpy2D
Assigning memory for contiguous 2D array
Dynamic 2d Array non contiguous memory c++ (The second answer in this one is rather puzzling.)

Answer 1

Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows.

float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = &bigArray[i * w];
}

Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA.

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

Question

1 answers

solution1
2 ACCPTED 2015-11-03 17:09:29

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

Question

1 answers

solution1 2 ACCPTED 2015-11-03 17:09:29

solution1
2 ACCPTED 2015-11-03 17:09:29