简体繁体中英

cudaMemcpy2D for shared memory copies

原文 2011-05-16 16:44:24 0 2 c++/ cuda

I have some memory that has been allocated on device that is just a single malloc of H*W*sizeof(float) in size.

This is to represent an H*W matrix.

I have a code where I need to swap the quadrants of the matrix. Can i use cudaMemcpy2D to accomplish this? Would I just need to specify the spitch and dpitch to be W*sizeof(float) and just use pointers to each quadrant of the matrix to accomplish this?

Also, when these cudaMemcpy talk about the memory areas not overlapping - does that mean src and dst cannot overlap at all? As in, if I had a 10 byte wide array that I wanted to shift left one time - it will fail?

Thanks

2 answers

You can use cudaMemcpy2D for moving around sub-blocks which are part of larger pitched linear memory allocations. There is no problem in doing that. The non-overlapping requirement is non-negotiable and it will fail if you try it. The source and destination can come from the same allocation, but the address ranges of the source and destination cannot overlap. If you need to do some "in-situ" copying where there is overlap, you might be better served to write a kernel to do it (see the matrix transpose example in the SDK as a sound way to do that kind of thing).

I suggest writing a simple kernel to do this matrix manipulation. I think it would be easier to write than using cudaMemcpy(2D) and almost definitely faster assuming you write it to get good memory coherence.

It's probably easiest to do an out-of-place transform (ie different input and output arrays) to avoid clobbering the input matrix. Each thread would simply read from its input offset and write to the transformed offset.

It would be similar to a matrix transpose. There is a matrix transpose example in the CUDA SDK.

cudaMallocPitch and cudaMemcpy2D

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

Unhandled exception with cudaMemcpy2D

Bad data coming from cudaMemcpy2D

cudaMemcpy2D error with large array

C++: Get BGR image (cv::Mat) from GPU memory (cudaMemcpy2D)

CUDA - cudaMallocPitch and cudaMemcpy2D use, Error: InvalidValue, InvalidPitchValue

Allocate 1 Dimension array with cudaMallocPitch and then copy to device with cudaMemcpy2D 3

cudaMemCpy2d error (cudaErrorInvalidValue) when running “debug” configuration

making non-shared copies of boost::interprocess shared memory objects

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question cudaMallocPitch and cudaMemcpy2D 2D arrays with contiguous rows on the heap memory for cudaMemCpy2D() Unhandled exception with cudaMemcpy2D Bad data coming from cudaMemcpy2D cudaMemcpy2D error with large array C++: Get BGR image (cv::Mat) from GPU memory (cudaMemcpy2D) CUDA - cudaMallocPitch and cudaMemcpy2D use, Error: InvalidValue, InvalidPitchValue Allocate 1 Dimension array with cudaMallocPitch and then copy to device with cudaMemcpy2D 3 cudaMemCpy2d error (cudaErrorInvalidValue) when running “debug” configuration making non-shared copies of boost::interprocess shared memory objects

Related Tags

cudaMemcpy2D for shared memory copies

Question

2 answers

solution1
2 ACCPTED 2011-05-16 17:09:24

solution2
1 2011-05-17 03:00:06

cudaMemcpy2D for shared memory copies

Question

2 answers

solution1 2 ACCPTED 2011-05-16 17:09:24

solution2 1 2011-05-17 03:00:06

solution1
2 ACCPTED 2011-05-16 17:09:24

solution2
1 2011-05-17 03:00:06