通過自定義 kernel 更改 cuda::GpuMat 值

Question

我正在使用 kernel 在實時攝像機 stream 上“循環”以突出顯示特定的顏色區域。 這些不能總是用一些cv::threshold重建，因此我使用的是 kernel。

目前的kernel如下：

__global__ void customkernel(unsigned char* input, unsigned char* output, int width, int height, int colorWidthStep, int outputWidthStep) {
    const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
    const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;

    if ((xIndex < width) && (yIndex < height)) {
        const int color_tid = yIndex * colorWidthStep + (3*xIndex);
        const int output_tid = yIndex * outputWidthStep + (3*xIndex);
        const unsigned char red   = input[color_tid+0];
        const unsigned char green = input[color_tid+1];
        const unsigned char blue  = input[color_tid+2];
        if (!(red > 100 && blue < 50 && red > 1.0*green)) {
            output[output_tid] = 255;
            output[output_tid+1] = 255; 
            output[output_tid+2] = 255;
        } else {
            output[output_tid] = 0;
            output[output_tid+1] = 0;
            output[output_tid+2] = 0;
        }
    }
}

這個 kernel 在這里被調用：

extern "C" void myFunction(cv::cuda::GpuMat& input, cv::cuda::GpuMat& output) {
    // Calculate total number of bytes of input and output image
    const int colorBytes = input.step * input.rows;
    const int outputBytes = output.step * output.rows;

    unsigned char *d_input, *d_output;

    // Allocate device memory
    SAFE_CALL(cudaMalloc<unsigned char>(&d_input,colorBytes),"CUDA Malloc Failed");
    SAFE_CALL(cudaMalloc<unsigned char>(&d_output,outputBytes),"CUDA Malloc Failed");

    // Copy data from OpenCV input image to device memory
    SAFE_CALL(cudaMemcpy(d_input,input.ptr(),colorBytes,cudaMemcpyHostToDevice),"CUDA Memcpy Host To Device Failed");

    // Specify a reasonable block size
    const dim3 block(16,16);

    // Calculate grid size to cover the whole image
    const dim3 grid((input.cols + block.x - 1)/block.x, (input.rows + block.y - 1)/block.y);

    // Launch the color conversion kernel
    custom_kernel<<<grid,block>>>(d_input,d_output,input.cols,input.rows,input.step,output.step);

    // Synchronize to check for any kernel launch errors
    SAFE_CALL(cudaDeviceSynchronize(),"Kernel Launch Failed");

    // Copy back data from destination device meory to OpenCV output image
    SAFE_CALL(cudaMemcpy(output.ptr(),d_output,outputBytes,cudaMemcpyDeviceToHost),"CUDA Memcpy Host To Device Failed");

    // Free the device memory
    SAFE_CALL(cudaFree(d_input),"CUDA Free Failed");
    SAFE_CALL(cudaFree(d_output),"CUDA Free Failed");
}

我包含了一個示例圖像，該圖像顯示了 kernel 在一輛紅色汽車上的結果。 正如你所看到的，有垂直的紅線，即使我嘗試訪問 RGB/BGR 值並將它們設置為零或 255。

我使用以下作為開始，但我覺得cv::Mat和cv::cuda::GpuMat不以相同的方式保存它們的值。 我讀到 GpuMat 的數據只有一個 ptr，並認為它將與blockIdx 、 blockDim參數一起使用。 https://github.com/sshniro/opencv-samples/blob/master/cuda-bgr-grey.cpp

具體問題：

紅線的原因是什么？
如何正確更改 RGB 值？

我在 NVidia Xavier NX 上的 Ubuntu 18.04 上使用 Cuda 10.2。

如評論中所述，我更改了cudaMemcpy function 的參數並刪除了cudaMalloc和cudaFree部分。 另外我提醒自己，OpenCV 將顏色存儲在 BGR 中，所以我更改了 kernel 中的 (+0,+1,+2)。 我直接通過 cv::imread 加載了紅色汽車，以排除任何以前的格式錯誤。 太成功了，kernel 工作。

Answer 1

正如@sgarizvi在評論中提到的cv::cuda::GpuMat已經駐留在 Gpu 中，所以我不得不使用cudaMemcpyDeviceToDevice而不是cudaMemcpyHostToDevice 。

也不需要分配新的 memory，這是通過刪除上面代碼的cudaMalloc和cudaFree部分實現的。

最后（只是在這種情況下，可能與其他人不同）我的圖像輸入是來自 StereoLabs 的 Zed 2，它以RGBA發布其圖像，因此 memory 內的順序是 R -> G -> B -> A，轉換為OpenCV 它是 B -> G -> R -> A 每個像素有 4 個步驟：

const int color_tid = yIndex * colorWidthStep + (4*xIndex);
const int output_tid = yIndex * outputWidthStep + (4*xIndex);

因此，要正確處理每個像素，您必須將指針增加四倍 xIndex，如果您只有 BGR/RGB 圖像，則使用三倍，如果是灰度圖像，則使用一次。

通過自定義 kernel 更改 cuda::GpuMat 值

問題描述

1 個解決方案

解決方案1
0 已采納 2020-12-07 07:01:17

通過自定義 kernel 更改 cuda::GpuMat 值

問題描述

1 個解決方案

解決方案1 0 已采納 2020-12-07 07:01:17

解決方案1
0 已采納 2020-12-07 07:01:17