CUDA内存不返回主机

Question

Okay so I'm trying to learn CUDA for the 'new' FX 570 I bought for $15 ;D now in the code there are NO errors, the the array1_host starts off with it's values correctly, but when I copy the memory from device to host the values remain the same. 好的，所以我正在尝试为我以15美元购买的``新''FX 570学习CUDA; D现在在代码中没有错误，array1_host正确启动了它的值，但是当我从设备复制内存到主机的值保持不变。 the same thing happens if I blank out the second kernel call (trying multiple kernels in this project) I'm rather confused so thank you for any help I can achieve :) 如果我清空第二个内核调用（在该项目中尝试多个内核），也会发生同样的事情，我很困惑，因此感谢您能提供的任何帮助:)

#include <cuda_runtime.h>
#include <iostream>

#pragma comment (lib, "cudart")

#define N 5000

__global__ void addArray(float* a, float* b)
{
   a[threadIdx.x] += b[threadIdx.x];
}
__global__ void timesArray(float* a, float* b)
{
   a[threadIdx.x] *= b[threadIdx.x];
}

int main(){
   float array1_host[N];
   float array2_host[N];

   float *array1_device;
   float *array2_device;

   cudaError_t err;

   for(int x = 0; x < N; x++){
       array1_host[x] = (float) x * 2;
       array2_host[x] = (float) x * 6;
   }

   err = cudaMalloc((void**)&array1_device, N*sizeof(float));
   err = cudaMalloc((void**)&array2_device, N*sizeof(float));

   err = cudaMemcpy(array1_device, array1_host, N*sizeof(float),   cudaMemcpyHostToDevice);
   err = cudaMemcpy(array2_device, array2_host, N*sizeof(float), cudaMemcpyHostToDevice);

   dim3 dimBlock( N );
   dim3 dimGrid ( 1 );

   addArray<<<dimGrid, dimBlock>>>(array1_device, array2_device); 
   timesArray<<<dimGrid, dimBlock>>>(array1_device, array2_device);

   err = cudaMemcpy(array1_host, array1_device, N*sizeof(float), cudaMemcpyDeviceToHost);

   cudaFree(array1_device);
   cudaFree(array2_device);

   std::cout << cudaGetErrorString(err) << "\n\n\n\n\n\n";
   std::cout << array1_host;


   cudaDeviceReset();

   system("pause");
   return 0;
}

Answer 1

You have an error, because N is 5000, but there are limits for threds in block - it depends on Compute Capability link to features on wiki . 您有一个错误，因为N为5000，但是块中的thred有限制-它取决于Wiki上功能的 Compute Capability 链接。
Try this code: 试试这个代码：

#define K 200

....

dim3 dimBlock( K );
dim3 dimGrid ( N/K );

To debug your code you can use cudaGetLastError() after each call of kernel or other function to know, where bugs are placed exaple about CUDA errors . 要调试代码，您可以在每次调用内核或其他函数后使用cudaGetLastError()来了解有关CUDA错误的错误放置位置。

CUDA内存不返回主机

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-03-13 09:08:24

CUDA内存不返回主机

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-03-13 09:08:24

解决方案1
3 已采纳 2013-03-13 09:08:24