未调用CUDA内核函数

Question

I'm getting started with CUDA, and I'm having some issues. 我刚开始使用CUDA，但遇到了一些问题。 The code I've posted below is basically the simplest example off the NVIDIA website, with some memory copies and a print statement added to make sure that it's running correctly. 我在下面发布的代码基本上是NVIDIA网站上最简单的示例，并添加了一些内存副本和打印语句以确保其正常运行。

The code compiles and runs without complaint, but when I print the vector c it comes out all zeros, as if the GPU kernel function isn't being called at all. 代码可以编译并运行而不会产生抱怨，但是当我打印矢量c时，它全为零，就好像根本没有调用GPU内核函数一样。

This is almost exactly the same as this post Basic CUDA - getting kernels to run on the device using C++ . 这几乎是完全一样的，因为这后基本CUDA -让内核使用C ++的设备上运行。

The symptoms are the same, although I don't seem to be making this error. 症状是一样的，尽管我似乎没有犯此错误。 Any ideas? 有任何想法吗？

#include <stdio.h>

static const unsigned short N = 3;

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
    int i = threadIdx.x;
    C[i] = A[i] + B[i];
} 

int main()
{
  float *A, *B, *C;
  float a[N] = {1,2,3}, b[N] = {4,5,6}, c[N] = {0,0,0};

  cudaMalloc( (void **)&A, sizeof(float)*N );
  cudaMalloc( (void **)&B, sizeof(float)*N );
  cudaMalloc( (void **)&C, sizeof(float)*N );

  cudaMemcpy( A, a, sizeof(float)*N, cudaMemcpyHostToDevice );
  cudaMemcpy( B, b, sizeof(float)*N, cudaMemcpyHostToDevice );

  VecAdd<<<1, N>>>(A, B, C);

  cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

  printf("%f %f %f\n", c[0],c[1],c[2]);

  cudaFree(A);
  cudaFree(B);
  cudaFree(C);

  return 0;
}

Answer 1

In the last cudaMemcpy call, you are passing incorrect flag for memory copy direction. 在上一个cudaMemcpy调用中，您传递的内存复制方向标志不正确。

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

It should be: 它应该是：

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyDeviceToHost );

未调用CUDA内核函数

问题描述

1 个解决方案

解决方案1
4 2014-02-24 09:07:37

未调用CUDA内核函数

问题描述

1 个解决方案

解决方案1 4 2014-02-24 09:07:37

解决方案1
4 2014-02-24 09:07:37