CUDA floating point gives different results

Question

I am converting an image from colour to grayscale using CUDA 5 / VC 2008.

The CUDA kernel is:

__global__ static void rgba_to_grayscale( const uchar4* const rgbaImage, unsigned char * const greyImage,
                                     int numRows, int numCols) 
{
    int pos = blockIdx.x * blockDim.x + threadIdx.x;
    if (pos < numRows * numCols) {
        uchar4 zz = rgbaImage[pos];
        float out = 0.299f * zz.x + 0.587f * zz.y + 0.114f * zz.z;
        greyImage[pos] = (unsigned char) out;
    }

}

The C++ function is:

inline unsigned char rgba_to_grayscale( uchar4 rgbaImage) 
{
    return (unsigned char) 0.299f * rgbaImage.x + 0.587f * rgbaImage.y + 0.114f * rgbaImage.z;
}

and they are both called appropriately. However they are yielding different results.

Original image :

这个彩色图像

CUDA version:

cuda结果

Serial CPU version:

序列号结果

Can anybody explain why the results are different?

Answer 1

There is no problem with your CUDA function. The CPU version is incorrect. You are typecasting the value 0.299f * rgbaImage.x to unsigned char which is equivalent to the following code:

inline unsigned char rgba_to_grayscale( uchar4 rgbaImage) 
{
    return ((unsigned char) 0.299f * rgbaImage.x) + 0.587f * rgbaImage.y + 0.114f * rgbaImage.z;
}

You have to cast the final result into unsigned char like this:

inline unsigned char rgba_to_grayscale( uchar4 rgbaImage) 
{
    return (unsigned char) (0.299f * rgbaImage.x + 0.587f * rgbaImage.y + 0.114f * rgbaImage.z);
}

Answer 2

@sga91 was pretty much there.... but is also appears that the byte order is different.

inline unsigned char rgba_to_grayscale( uchar4 rgbaImage) 
{
    return (unsigned char) (0.299f * rgbaImage.z + 0.587f * rgbaImage.y + 0.114f * rgbaImage.y);
}

note that the x and z are transposed....

I do remember reading about it before but I cannot find the reference now...

CUDA floating point gives different results

Question

Original image :

CUDA version:

Serial CPU version:

2 answers

solution1
8 ACCPTED 2013-05-12 12:37:13

solution2
0 2013-05-13 14:47:04

CUDA floating point gives different results

Question

Original image :

CUDA version:

Serial CPU version:

2 answers

solution1 8 ACCPTED 2013-05-12 12:37:13

solution2 0 2013-05-13 14:47:04

solution1
8 ACCPTED 2013-05-12 12:37:13

solution2
0 2013-05-13 14:47:04