简体   繁体   中英

Issues with a cuda program

I am writing a simple cuda program where I am creating a 2D array in the device and then I am doing very basic operation in the kernel function and after the operation I am copying it back to a 2D array of the host. I wrote this code after following several threads of stackoverlow and also cuda forum. I followed what was suggested but the output of the code I am getting is 0 whereas I am expecting an output of 10 for all the members of the array. I am posting my code below :

__global__ void test_kernel(int *dev_ptr[])
    int tidx = threadIdx.x;
    int tidy = threadIdx.y;

    dev_ptr[tidx][tidy] = dev_ptr[tidx][tidy] +10;

int main(int argc,char *argv[])

    int env_end =50;
    int **h_ptr ;
    int **d_ptr ;
    int **env_t;
    int i,k,j;
    /* cpu
    env_t =(int **) malloc(env_end * sizeof *env_t);
    {env_t[k] = (int *)malloc(env_end* env_end* sizeof *env_t[0]);                                                                                                                             

    for (k = 1; k < env_end; ++k)
        env_t[k] = env_t[k - 1] + env_end;

    memset(*env_t, 0, env_end * env_end* sizeof **env_t);

    for (i=0;i<env_end;i++)
    {  for(j=0;j<env_end;j++)
    {printf("%d\t",env_t[i][j]);        }
    if (j==env_end-1)
    {printf("\n");  }

    /* gpu

    h_ptr = (int **)malloc(env_end*sizeof(int *));
    for (i=0;i<env_end;i++)
    {  cudaMalloc((void **)&h_ptr[i],env_end*sizeof(int));

    cudaMalloc((void ***)d_ptr,env_end*sizeof(int));

    /* kernel function and declaration

          dim3 blockDim(env_end,env_end,1);


    /* Copying data back to host
          for (i=0;i<env_end;i++)

          for (i=0;i<env_end;i++)
          {  for(j=0;j<env_end;j++)
          {printf("%d\t",env_t[i][j]);      }
          if (j==env_end-1)
          {printf("\n");    }

    /* Freeing the memory locations
          for (i=0;i<env_end;i++)


          for (i=0;i<env_end;i++)
          { free(env_t[i]);


One more thing is that I am writing the code in MS visual studio 2010 and I am getting a debug assertion failed notification. I am not sure what I have done wrong and why this notification is coming. Thanks for all your help.

There are a few issues with this code. including:

  • Size mismatch between h_ptr , env_t , and d_ptr .
  • Use & instead of (***void) for cudaMalloc .
  • Do not allocate the host memory by cudaMalloc .
  • Optimization: 2D memory is allocated non-consequently in global memory. Allocate 1D memory and refer to it as 2D.

Here is the full code:

#include <stdio.h>
#define SIZE 10
#define INDEX(i,j,k) i*k+j

__global__ void test_kernel(int *dev_ptr, int row_size)
    int tidx = threadIdx.x;
    int tidy = threadIdx.y;

    dev_ptr[INDEX(tidx,tidy,row_size)] = dev_ptr[INDEX(tidx,tidy,row_size)] +10;

int main(int argc,char *argv[])

    int env_end =SIZE;
    int *d_ptr=NULL;
    int *env_t;
    int i,j;

    // cpu
    env_t =(int *) malloc(env_end * env_end * sizeof(int));
    memset(env_t, 0, env_end * env_end* sizeof(int));

    printf("Input Array:\n");
    for (i=0;i<env_end;i++)
    {   for(j=0;j<env_end;j++)
        {printf("%d\t",env_t[INDEX(i,j,env_end)]);        }

    // gpu

    // kernel function and declaration
    dim3 blckDim(env_end,env_end,1);
    test_kernel<<<1,blckDim>>>(d_ptr, env_end);

    // Copying data back to host

    printf("Output Array:\n");
    for (i=0;i<env_end;i++)
    {  for(j=0;j<env_end;j++)
        {printf("%d\t",env_t[INDEX(i,j,env_end)]);      }

    // Freeing the memory locations



The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM