堆内存上具有cudaMemCpy2D（）连续行的2D数组

Question

CUDA documentation recommends the use of cudaMemCpy2D() for 2D arrays (and similarly cudaMemCpy3D() for 3D arrays) instead of cudaMemCpy() for better performance as the former allocates device memory more appropriately. CUDA文档建议将cudaMemCpy2D()用于2D阵列（对于3D阵列，则类似地使用cudaMemCpy3D() ），而不是cudaMemCpy()以获得更好的性能，因为前者会更适当地分配设备内存。 On the other hand, all cudaMemCpy functions, just like memcpy() , require contiguous allocation of memory. 另一方面，所有的cudaMemCpy函数（如memcpy()都需要连续分配内存。

This is all fine if I create my (host) array as, for example, float myArray[h][w]; 如果我创建我的（宿主）数组例如float myArray[h][w];这很好float myArray[h][w]; . 。 However, it most likely will not work if I use something like: 但是，如果使用以下方法，则很可能无法正常工作：

float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = new float[w];
}

This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. 除了当人们试图将CUDA实施到现有项目中时，这不是一个大问题，这是我面临的问题。 Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use cudaMemCpy() and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way. 现在，我创建一个临时的1D数组，将2D数组的内容复制到其中，并使用cudaMemCpy()并在内核启动后重复整个过程以获得结果，但这似乎不是一种优雅/有效的方法。

Is there a better way to handle this situation? 有没有更好的方法来处理这种情况？ Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use cudaMemCpy2D() ? 具体来说，有没有一种方法可以在具有连续分配的行的堆上创建真正的2D数组，以便可以使用cudaMemCpy2D() ？

PS: I couldn't find the answer to this question the following previous similar posts: PS：在以下类似的帖子中找不到该问题的答案：

Allocate 2D array with cudaMallocPitch and copying with cudaMemcpy2D 使用cudaMallocPitch分配2D数组，并使用cudaMemcpy2D复制
Assigning memory for contiguous 2D array 为连续的2D数组分配内存
Dynamic 2d Array non contiguous memory c++ (The second answer in this one is rather puzzling.) 动态2d数组非连续内存c ++ （此问题的第二个答案令人费解。）

Answer 1

Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows. 分配大数组，然后使用指针算法查找行的实际开头。

float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = &bigArray[i * w];
}

Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA. 您的myArray2指针数组为您提供C / C ++风格的二维数组行为，bigArray为您提供CUDA所需的连续内存块。

堆内存上具有cudaMemCpy2D（）连续行的2D数组

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-03 17:09:29

堆内存上具有cudaMemCpy2D（）连续行的2D数组

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-03 17:09:29

解决方案1
2 已采纳 2015-11-03 17:09:29