简体   繁体   English

使用3D cuda Memory时,最好是传递相关的cudaPitchedPtr还是只传递cudaPitchedPtr结构中的原始指针?

[英]When using 3D cuda Memory is it better to pass the associated cudaPitchedPtr or just the raw pointer in the cudaPitchedPtr struct?

The example in the nvidia programming guide shows them passing the pitchedPtr to their kernel: nvidia编程指南中的示例显示了它们将pitchedPtr传递给它们的内核:

__global__ void MyKernel(cudaPitchedPtr devPitchedPtr,int width, int height, int depth)

But instead of that why not just allocate in the same manner, but then call like: 但不是这样,为什么不只是以相同的方式分配,而是调用如下:

__global__ void MyKernel(float* devPtr,int pitch, int width, int height, int depth)

and then access the elements however you like. 然后访问您喜欢的元素。 I would prefer the latter implementation, but why does the programming guide give the other example (and albeit a bad example - illustrating how to access the elements but also illustrating a design pattern that should not be implemented with cuda). 我更喜欢后一种实现,但为什么编程指南给出了另一个例子(虽然是一个不好的例子 - 说明如何访问元素,但也说明了不应该用cuda实现的设计模式)。

Edit : meant to say that the float * devPtr is the ptr (void * ptr) member of the cudaPitchedPtr. 编辑:意思是说float * devPtr是cudaPitchedPtr的ptr(void * ptr)成员。

I assume your talking about cudaMalloc3D: 我假设你在谈论cudaMalloc3D:

From the CUDA reference regarding cudaMalloc3D: 关于cudaMalloc3D的CUDA参考:

Allocates at least width * height * depth bytes of linear memory on the device and returns a cudaPitchedPtr in which ptr is a pointer to the allocated memory. 在设备上至少分配宽度*高度*深度字节的线性内存,并返回一个cudaPitchedPtr,其中ptr是指向已分配内存的指针。 The function may pad the allocation to ensure hardware alignment requirements are met. 该功能可以填充分配以确保满足硬件对齐要求。

So 所以

cudaMalloc3D(&pitchedDevPtr, make_cudaExtent(w, h, d));

does: 作用:

cudaMalloc(&devPtr, w * h * d);

There is no difference to a call of cudaMalloc, but if you like it, you get some convenience. 对cudaMalloc的调用没有区别,但是如果你喜欢它,你会得到一些便利。 You don't have to calculate the size of your array by your own just pass a cudaExtent struct to the function. 您不必自己计算数组的大小,只需将cudaExtent结构传递给函数即可。 Ofcorse you get an array in bytes . Ofcorse你得到一个字节数组。 There is no definition of the size of your data type specified in the cudaExtent structure. 没有cudaExtent结构中指定的数据类型大小的定义。

If you pass your plain pointer, or your cudaPitchedPtr to the kernel is a design decision. 如果你传递普通指针,或者你的cudaPitchedPtr是一个设计决定。 Your cudaPitchedPtr delivers not only the devPtr to your kernel, it also stores the amount of memory and the size of the dimensions. 您的cudaPitchedPtr不仅会向您的内核提供devPtr,还会存储内存量和维度大小。 For memory and so also register saving you get only the size in x and y direction, z is just pitch / (x * y). 对于存储器以及寄存器保存,只能获得x和y方向的大小,z只是音高/(x * y)。

EDIT: As pointed out cudaMalloc3D adds padding to assure coalesced memory access. 编辑:正如所指出的,cudaMalloc3D添加了填充以确保合并的内存访问。 But since Compute Capability 1.2 a memory access can by coalesced even if the starting address is not propperly aligned. 但是,由于Compute Capability 1.2,即使起始地址没有正确对齐,内存访问也可以合并。 On devices witch CC >= 1.2 there is no difference between those two allocations regarding performance. 在CC> = 1.2的设备上,这两个关于性能的分配没有区别。

Either method is equally valid - it is purely an aesthetic decision on your part. 任何一种方法都同样有效 - 这纯粹是一种美学决定。

It is not even clear to me why cudaPitchedPtr has extra members - the only ones that really matter are the base pointer and the pitch. 我甚至不清楚为什么cudaPitchedPtr有额外的成员 - 唯一真正重要的是基本指针和音高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM