C++: Get BGR image (cv::Mat) from GPU memory (cudaMemcpy2D)

Question

I am working on image processing and developed camera wrappers with OpenCV for a RGB and a monochrome camera. Now I have to use an existing algorithm that works with CUDA to process those two camera image streams. For that I have to copy the Mat images to my device (the algorithm does not take gpumat). I use cv::Mat::ptr to access the data of the images. When I use cudaMemcpy2D to get the image back to the host, I receive a dark image (zeros only) for the RGB image. Even when I use cudaMemcpy2D to just load it to the device and bring it back in the next step with cudaMemcpy2D it won't work (by that I mean I don't do any image processing in between). It works fine for the mono image though:

width = 1920; (image dimensions are the same for mono and BGR)
height = 1080;
Mat mat_mono(height, width, CV_8UC1);
Mat mat_mono_disp(height, width, CV_8UC1);
size_t pitch_mono;
uint8_t* image_mono_gpu,
size_t matrixLenMono = width;

cudaMallocPitch(&image_mono_gpu, &pitch_mono, width, height);

mat_mono = MonoCamera.CaptureMat(1); // wrapper for the mono camera that grabs the image

// copy to device
cudaMemcpy2D(image_mono_gpu, pitch_mono, mat_mono.ptr(), width, matrixLenMono, height, cudaMemcpyHostToDevice);

// copy back to host
cudaMemcpy2D(mat_mono_disp.ptr(), matrixLenMono, image_mono_gpu, pitch_mono, matrixLenMono, height, cudaMemcpyDeviceToHost);

namedWindow("Display window", WINDOW_AUTOSIZE);
imshow("Display window", mat_mono_disp);

This is the code for the RGB (or rather BGR) image, where I only receive a dark image after retrieving the image from the device:

Mat mat_BGR(height, width, CV_8UC3);
Mat mat_BGR_disp(height, width, CV_8UC3);
size_t pitch_BGR;
uint8_t* image_BGR_gpu,
size_t matrixLenBGR = width * 3;

cudaMallocPitch(&image_BGR_gpu, &pitch_BGR, matrixLenBGR, height);

mat_BGR = RGBCamera.CaptureMat(1); // wrapper for the RGB camera that grabs the image

// copy to device
cudaMemcpy2D(image_BGR_gpu, pitch_BGR, mat_BGR.ptr(), width, matrixLenBGR, height, cudaMemcpyHostToDevice);

// copy back to host
cudaMemcpy2D(mat_BGR_disp.ptr(), matrixLenBGR, image_BGR_gpu, pitch_BGR, matrixLenBGR, height, cudaMemcpyDeviceToHost);

namedWindow("Display window", WINDOW_AUTOSIZE);
imshow("Display window", mat_BGR_disp);

Does this mean that using cv::Mat:ptr with a mono image works as this is a special case? I don't know what I have to consider additionally when using the BGR image instead.

Answer 1

As pointed out in a previous answer , when performing 2D memory copy of OpenCV Mat to device memory allocated using cudaMallocPitch ( or any strided 2D memory ), we have to use the step member of the OpenCV Mat to specify the alignment of each row.

In the provided code, the correct way would be to use mat_BGR.step instead of width in the 4th argument of cudaMemcpy2D .

cudaMemcpy2D(image_BGR_gpu, pitch_BGR, mat_BGR.ptr(), mat_BGR.step, matrixLenBGR, height, cudaMemcpyHostToDevice);
                                                              ^^^^

C++: Get BGR image (cv::Mat) from GPU memory (cudaMemcpy2D)

Question

1 answers

solution1
2 2020-02-04 06:30:07

C++: Get BGR image (cv::Mat) from GPU memory (cudaMemcpy2D)

Question

1 answers

solution1 2 2020-02-04 06:30:07

solution1
2 2020-02-04 06:30:07