简体   繁体   English

堆内存上具有cudaMemCpy2D()连续行的2D数组

[英]2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

CUDA documentation recommends the use of cudaMemCpy2D() for 2D arrays (and similarly cudaMemCpy3D() for 3D arrays) instead of cudaMemCpy() for better performance as the former allocates device memory more appropriately. CUDA文档建议将cudaMemCpy2D()用于2D阵列(对于3D阵列,则类似地使用cudaMemCpy3D() ),而不是cudaMemCpy()以获得更好的性能,因为前者会更适当地分配设备内存。 On the other hand, all cudaMemCpy functions, just like memcpy() , require contiguous allocation of memory. 另一方面,所有的cudaMemCpy函数(如memcpy()都需要连续分配内存。

This is all fine if I create my (host) array as, for example, float myArray[h][w]; 如果我创建我的(宿主)数组例如float myArray[h][w];这很好float myArray[h][w]; . However, it most likely will not work if I use something like: 但是,如果使用以下方法,则很可能无法正常工作:

float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = new float[w];
}

This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. 除了当人们试图将CUDA实施到现有项目中时,这不是一个大问题,这是我面临的问题。 Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use cudaMemCpy() and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way. 现在,我创建一个临时的1D数组,将2D数组的内容复制到其中,并使用cudaMemCpy()并在内核启动后重复整个过程以获得结果,但这似乎不是一种优雅/有效的方法。

Is there a better way to handle this situation? 有没有更好的方法来处理这种情况? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use cudaMemCpy2D() ? 具体来说,有没有一种方法可以在具有连续分配的行的堆上创建真正的2D数组,以便可以使用cudaMemCpy2D()

PS: I couldn't find the answer to this question the following previous similar posts: PS:在以下类似的帖子中找不到该问题的答案:

Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows. 分配大数组,然后使用指针算法查找行的实际开头。

float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = &bigArray[i * w];
}

Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA. 您的myArray2指针数组为您提供C / C ++风格的二维数组行为,bigArray为您提供CUDA所需的连续内存块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM