简体   繁体   English

一个简单的CUDA代码中的问题,将复杂的数据从主机传输到设备

[英]Issue in a simple CUDA code transferring complex data from host to device

I have the following code copied from internet and try to compile in the server with Tesla C2075 installed with should support double precision, I also compile the code with flag sm_20 我从互联网复制了以下代码,并尝试在安装了Tesla C2075的服务器中进行编译,并应支持双精度,我还使用标志sm_20编译了该代码

#include <iostream>
#include <iomanip>
#include <fstream>
#include <cuda_runtime.h>
#include <cuComplex.h>
#include <cublas_v2.h>

using namespace std;

typedef double2 Complex;

#define m 1024
#define n 300
#define k 1024

int main(int argc, char *argv[])
{
  Complex _A[m*k], _B[k*n];
  Complex *A, *B;

  cudaMalloc((void**)&A, m*k*sizeof(Complex));
  cudaMalloc((void**)&B, k*n*sizeof(Complex));

  for (int i=0; i<m*k; i++) _A[i] = make_cuDoubleComplex(rand()/(double)RAND_MAX, rand()/(double)RAND_MAX);;
  for (int i=0; i<k*n; i++) _B[i] = make_cuDoubleComplex(rand()/(double)RAND_MAX, rand()/(double)RAND_MAX);

  cudaMemcpy( A, _A, (m*k)*sizeof(Complex), cudaMemcpyHostToDevice );
  cudaMemcpy( B, _B, (k*n)*sizeof(Complex), cudaMemcpyHostToDevice );

  return 0;
}

It does compile but in runtime, it always returns "Segmentation fault (core dumped)". 它确实可以编译,但是在运行时,它始终返回“分段错误(内核已转储)”。 Is that anything wrong with the code? 代码有什么问题吗? Thanks. 谢谢。

Your arrays _A and _B are most likely too large to fit on the stack. 数组_A_B很可能太大而无法容纳在堆栈中。 A quick-n-dirty fix is to move the arrays out to global scope. 快速解决问题的方法是将阵列移出全局范围。 A better fix is to allocate them dynamically using new and delete as follows: 更好的解决方法是使用new和delete动态分配它们,如下所示:

Complex *_A = new Complex[m*k];
Complex *_B = new Complex[k*n];
...
delete [] _A;
delete [] _B;

An even better option, since you're using C++, is to use a std::vector: 由于使用的是C ++,因此更好的选择是使用std :: vector:

std::vector < Complex > _A(m*k);
std::vector < Complex > _B(k*n);

// But now to get the pointer you need this:
cudaMemcpy( A, &_A[0], (m*k)*sizeof(Complex), cudaMemcpyHostToDevice );
// etc.

That &_A[0] syntax means: take the address of the first element of the vector, which is the same as a pointer to the entire array. &_A[0]语法的含义是:获取向量的第一个元素的地址,该地址与指向整个数组的指针相同。 The reason to prefer a vector over manually allocating the memory is that destruction/deallocation happens automatically when the variable goes out of scope, which is essential for writing exception-safe code. 选择向量而不是手动分配内存的原因是,当变量超出范围时,销毁/重新分配会自动发生,这对于编写异常安全代码至关重要。

You'll also need #include <vector> 您还需要#include <vector>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM