float2 matrix (as 1D array) and CUDA

Question

I have to work with a float2 matrix as a 1D array. I wanted to check some things and I have written this code:

#include <stdio.h>
#include <stdlib.h>

#define index(x,y) x+y*N

__global__ void test(float2* matrix_CUDA,int N)
{   
    int i,j;

    i=blockIdx.x*blockDim.x+threadIdx.x;
    j=blockIdx.y*blockDim.y+threadIdx.y;

    matrix_CUDA[index(i,j)].x=i;
    matrix_CUDA[index(i,j)].y=j;

}

int main()
{
    int N=256;

    int i,j;

    //////////////////////////////////////////

    float2* matrix;

    matrix=(float2*)malloc(N*N*sizeof(float2));

    //////////////////////////////////////////

    float2* matrix_CUDA;

    cudaMalloc((void**)&matrix_CUDA,N*N*sizeof(float2));

    //////////////////////////////////////////

    dim3 block_dim(32,2,0);
    dim3 grid_dim(2,2,0);

    test <<< grid_dim,block_dim >>> (matrix_CUDA,N);

    //////////////////////////////////////////

    cudaMemcpy(matrix,matrix_CUDA,N*N*sizeof(float2),cudaMemcpyDeviceToHost);


    for(i=0;i<N;i++)
    {
        for(j=0;j<N;j++)
        {
            printf("%d %d, %f %f\n",i,j,matrix[index(i,j)].x,matrix[index(i,j)].y);
        }
    }


    return 0;
}

I was waiting for a output like:

But the thing I find is:

0 0, -nan 7.265723657
0 1, -nan 152345
0 2, 25.2135235 -nan
0 3, 52354.324534 24.52354234523
...

That means I have some problems with the memory allocation (I suppose) but I can't find what is wrong with my code. Could someone help me?

Answer 1

Any time you are having trouble with a CUDA code, you should always use proper CUDA error checking and run your code with cuda-memcheck , before asking for help.

Even if you don't understand the output, it will be useful to others trying to help you.

If you had run this code with cuda-memcheck , you would have gotten (amongst all your other output!) some output like this:

$ cuda-memcheck ./t1273
========= CUDA-MEMCHECK
========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaLaunch.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib64/libcuda.so.1 [0x2eea03]
=========     Host Frame:./t1273 [0x3616e]
=========     Host Frame:./t1273 [0x2bfd]
=========     Host Frame:./t1273 [0x299a]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b15]
=========     Host Frame:./t1273 [0x2a5d]
=========
========= ERROR SUMMARY: 1 error
$

This means something is wrong with the way you configured your kernel launch:

dim3 block_dim(32,2,0);
dim3 grid_dim(2,2,0);

test <<< grid_dim,block_dim >>> (matrix_CUDA,N);
         ^^^^^^^^^^^^^^^^^^
         kernel config arguments

Specifically, you do not ever select a dimension of zero when creating a dim3 variable for kernel launch. The minimum dimension for any component is 1, not zero.

So use arguments like this:

dim3 block_dim(32,2,1);
dim3 grid_dim(2,2,1);

In addition, once you fix that, you still find that many of your outputs are not touched by your code. To fix that, you'll need to increase the size of your thread array to match the size of your data array. Since you have a 1-D array, it's not really clear to me why you are launching 2D threadblocks and 2D grids. Your data array should be completely "coverable" with a total of 65536 threads in a linear dimension, something like this:

dim3 block_dim(32,1,1);
dim3 grid_dim(2048,1,1);

float2 matrix (as 1D array) and CUDA

Question

1 answers

solution1
2 ACCPTED 2016-11-05 23:35:42

float2 matrix (as 1D array) and CUDA

Question

1 answers

solution1 2 ACCPTED 2016-11-05 23:35:42

solution1
2 ACCPTED 2016-11-05 23:35:42