简体   繁体   中英

CUDA - convert RGB image to Grayscale

I am starting to learn CUDA GPU programming from Udacity video course (course is 2 yrs old). I am using CUDA 5.5 with Visual Studio Express 2012 (students edition, so not all features of CUDA debugging is not available) on Nvidia GeForce GT 630M GPU .

Just implemented some vector addition and other simple operations.

Now I am trying to convert a RGB image to Grayscale . I am reading image with help of OpenCV. (Anyway I failed whatever methods I tried. That is why I am here)

Below is my .cpp file : https://gist.github.com/abidrahmank/7020863

Below is my .cu file : https://gist.github.com/abidrahmank/7020910

My input image is a simple 64x64 color image (Actually I used 512x512 image first, didn't work, so brought down to 64x64 to check if that is the problem. It doesn't seem so)

Problem

My output image of CUDA implementation is a white image . All value 255. Somewhere here and there, there are some gray pixels, may be less than 1%. Remaining everything is white.

What I tried:

For three days, I tried following things:

  1. I thought problem may be due image size, so that number of threads may not be optimal or something like that, So reduced image size. Still same result.
  2. I tried a similar example, created a 64x64 array. Take its two pixels at a time, and find the square of their sums, and it worked fine. Here is the code : https://gist.github.com/abidrahmank/7021023
  3. Started checking data one-by-one at each stage. Input image just before loading to GPU is fine. But input data, when I checked inside kernel, is always 255. (Check line 14 here )
  4. Finally I set all GPU data to zero using CudaMemset and checked input data inside kernel, it is still 255.

So I don't have any other option to do other asking at StackOverflow.

Can anyone tell me what is the mistake I am making?

Your kernel signature says:

__global__ void kernel(unsigned char* d_in, unsigned char* d_out)

But you call it like:

kernel<<<rows,cols>>>(d_out, d_in);

Which one is in and which one is out ?

Having done quite a bit of CUDA programming in the past, I would strongly recommend that you use Thrust instead of hand-crafting kernels. Even thrust::for_each is hard to beat with raw kernels.

Besides the parameter issue indicated by DanielKO, you also have problems on thread/block settings.

Since you've already treat your 2-D image as a 1-D array, here's a good example showing how to set thread/block for data with arbitrary size.

https://developer.nvidia.com/content/easy-introduction-cuda-c-and-c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM