简体   繁体   中英

cudaMemcpy2D for shared memory copies

I have some memory that has been allocated on device that is just a single malloc of H*W*sizeof(float) in size.

This is to represent an H*W matrix.

I have a code where I need to swap the quadrants of the matrix. Can i use cudaMemcpy2D to accomplish this? Would I just need to specify the spitch and dpitch to be W*sizeof(float) and just use pointers to each quadrant of the matrix to accomplish this?

Also, when these cudaMemcpy talk about the memory areas not overlapping - does that mean src and dst cannot overlap at all? As in, if I had a 10 byte wide array that I wanted to shift left one time - it will fail?

Thanks

You can use cudaMemcpy2D for moving around sub-blocks which are part of larger pitched linear memory allocations. There is no problem in doing that. The non-overlapping requirement is non-negotiable and it will fail if you try it. The source and destination can come from the same allocation, but the address ranges of the source and destination cannot overlap. If you need to do some "in-situ" copying where there is overlap, you might be better served to write a kernel to do it (see the matrix transpose example in the SDK as a sound way to do that kind of thing).

I suggest writing a simple kernel to do this matrix manipulation. I think it would be easier to write than using cudaMemcpy(2D) and almost definitely faster assuming you write it to get good memory coherence.

It's probably easiest to do an out-of-place transform (ie different input and output arrays) to avoid clobbering the input matrix. Each thread would simply read from its input offset and write to the transformed offset.

It would be similar to a matrix transpose. There is a matrix transpose example in the CUDA SDK.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM