CUDA kernel function not called

Question

I'm getting started with CUDA, and I'm having some issues. The code I've posted below is basically the simplest example off the NVIDIA website, with some memory copies and a print statement added to make sure that it's running correctly.

The code compiles and runs without complaint, but when I print the vector c it comes out all zeros, as if the GPU kernel function isn't being called at all.

This is almost exactly the same as this post Basic CUDA - getting kernels to run on the device using C++ .

The symptoms are the same, although I don't seem to be making this error. Any ideas?

#include <stdio.h>

static const unsigned short N = 3;

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
    int i = threadIdx.x;
    C[i] = A[i] + B[i];
} 

int main()
{
  float *A, *B, *C;
  float a[N] = {1,2,3}, b[N] = {4,5,6}, c[N] = {0,0,0};

  cudaMalloc( (void **)&A, sizeof(float)*N );
  cudaMalloc( (void **)&B, sizeof(float)*N );
  cudaMalloc( (void **)&C, sizeof(float)*N );

  cudaMemcpy( A, a, sizeof(float)*N, cudaMemcpyHostToDevice );
  cudaMemcpy( B, b, sizeof(float)*N, cudaMemcpyHostToDevice );

  VecAdd<<<1, N>>>(A, B, C);

  cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

  printf("%f %f %f\n", c[0],c[1],c[2]);

  cudaFree(A);
  cudaFree(B);
  cudaFree(C);

  return 0;
}

Answer 1

In the last cudaMemcpy call, you are passing incorrect flag for memory copy direction.

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

It should be:

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyDeviceToHost );

CUDA kernel function not called

Question

1 answers

solution1
4 2014-02-24 09:07:37

CUDA kernel function not called

Question

1 answers

solution1 4 2014-02-24 09:07:37

solution1
4 2014-02-24 09:07:37