简体   繁体   中英

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

CUDA documentation recommends the use of cudaMemCpy2D() for 2D arrays (and similarly cudaMemCpy3D() for 3D arrays) instead of cudaMemCpy() for better performance as the former allocates device memory more appropriately. On the other hand, all cudaMemCpy functions, just like memcpy() , require contiguous allocation of memory.

This is all fine if I create my (host) array as, for example, float myArray[h][w]; . However, it most likely will not work if I use something like:

float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = new float[w];
}

This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use cudaMemCpy() and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way.

Is there a better way to handle this situation? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use cudaMemCpy2D() ?

PS: I couldn't find the answer to this question the following previous similar posts:

Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows.

float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = &bigArray[i * w];
}

Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM