简体   繁体   中英

CUDA kernel function not called

I'm getting started with CUDA, and I'm having some issues. The code I've posted below is basically the simplest example off the NVIDIA website, with some memory copies and a print statement added to make sure that it's running correctly.

The code compiles and runs without complaint, but when I print the vector c it comes out all zeros, as if the GPU kernel function isn't being called at all.

This is almost exactly the same as this post Basic CUDA - getting kernels to run on the device using C++ .

The symptoms are the same, although I don't seem to be making this error. Any ideas?

#include <stdio.h>

static const unsigned short N = 3;

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
    int i = threadIdx.x;
    C[i] = A[i] + B[i];
} 

int main()
{
  float *A, *B, *C;
  float a[N] = {1,2,3}, b[N] = {4,5,6}, c[N] = {0,0,0};

  cudaMalloc( (void **)&A, sizeof(float)*N );
  cudaMalloc( (void **)&B, sizeof(float)*N );
  cudaMalloc( (void **)&C, sizeof(float)*N );

  cudaMemcpy( A, a, sizeof(float)*N, cudaMemcpyHostToDevice );
  cudaMemcpy( B, b, sizeof(float)*N, cudaMemcpyHostToDevice );

  VecAdd<<<1, N>>>(A, B, C);

  cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

  printf("%f %f %f\n", c[0],c[1],c[2]);

  cudaFree(A);
  cudaFree(B);
  cudaFree(C);

  return 0;
}

In the last cudaMemcpy call, you are passing incorrect flag for memory copy direction.

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

It should be:

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyDeviceToHost );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM