cudaMemcpy未将主机矩阵复制到设备（产生分段错误）

Question

Below is the code where i get Segmentation fault when i am trying to print the matrix d_A which is being copied from host matrix h_A.when i am trying to print matrix h_A just before cudamalloc it gets printed but after cudamemcpy trying to print d_A(Device matrix) gives me error. 以下是我尝试打印从主机矩阵h_A复制的矩阵d_A时出现分段错误的代码。当我尝试在cudamalloc被打印之前但在cudamemcpy试图打印d_A（设备之后）打印矩阵h_A时矩阵）给我错误。

I am using the following:- nvcc -arch=sm_20 Trial.cu -o out to compile 我正在使用以下命令：-nvcc -arch = sm_20 Trial.cu -o进行编译

  #include <stdio.h>
  #include <sstream> 
  #include <stdlib.h> 
  #include <time.h> 
  #include <math.h> 
  #include <unistd.h> 
  #include <sys/time.h> 
  #include <stdint.h>
  #include <cuda.h> 
  #include <time.h> 
  inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
  {
     if (code != cudaSuccess)
     {
       fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
       if (abort) exit(code);
     }
  }

  void LUdecomposition(float *h_A,float *A_,int dim,unsigned int size_A,int row_A)
  { 
    float *d_A;int k;
    gpuErrchk(cudaMalloc(&d_A, size_A*sizeof(float)));
    gpuErrchk(cudaMemcpy(d_A, h_A, size_A*sizeof(float), cudaMemcpyHostToDevice));

    printf("\n D_A");

    gpuErrchk(cudaMemcpy(A_,d_A,size_A*sizeof(float), cudaMemcpyDeviceToHost));

    for(int i=0; i<size_A; i++)
    {

            if (i % row_A == 0) printf("\n");
            printf("%f ", A_[i]);

    }
    printf("\n D_A");      
  }
  void input_matrix_generation_A(float *Matrix, unsigned int row, unsigned int column,  unsigned int size)
  {

    for (int i=0; i<size; i++)
    {
            Matrix[i] = rand()%5+1;
            if (i % column == 0) printf("\n");
    }       

  }       
  int main(int argc, char *argv[])
  {
    int m=4;int dim=2;

    int size_A=m*m;
    float *A, *A_;

    A = (float*)malloc(sizeof(float)*size_A);
    input_matrix_generation_A(A,m,m,size_A);

    A_ = (float*)malloc(sizeof(float)*size_A);
    LUdecomposition(A,A_,dim,size_A,m);
     for(int i=0; i<size_A; i++)
    {

            if (i % row_A == 0) printf("\n");
            printf("%f ", A_[i]);

    }

    return 0;
   }

Answer 1

You are trying to access (de-reference) a device pointer from the host, which is resulting in undefined behavior and causing segmentation fault. 您试图从主机访问（取消引用）设备指针，这将导致未定义的行为并导致分段错误。 So the following line of code is invalid: 因此以下代码行无效：

printf("%f ", d_A[i]);

Also, you are copying back extra amount of memory: 另外，您正在复制回额外的内存：

cudaMemcpy(A_,d_A,size_A*sizeof(double), cudaMemcpyDeviceToHost);

It should be 它应该是

cudaMemcpy(A_,d_A,size_A*sizeof(float), cudaMemcpyDeviceToHost);

Answer 2

In your code at about line 23, you write: 在大约第23行的代码中，您将编写：

for(int i=0; i<size_A; i++)
{
    if (i % row_A == 0) printf("\n");
    printf("%f ", d_A[i]);
}

and this is the part that triggers the segment fault. 这是触发段故障的部分。

Please notice that the device pointer d_A is in the memory space of global memory on GPU, and shall be never de-referenced directly on CPU side. 请注意，设备指针d_A在GPU上的全局内存的存储空间中，并且绝对不能在CPU侧直接取消引用。

cudaMemcpy未将主机矩阵复制到设备（产生分段错误）

问题描述

2 个解决方案

解决方案1
2 2013-12-02 06:46:58

解决方案2
1 2013-12-02 06:46:44

cudaMemcpy未将主机矩阵复制到设备（产生分段错误）

问题描述

2 个解决方案

解决方案1 2 2013-12-02 06:46:58

解决方案2 1 2013-12-02 06:46:44

解决方案1
2 2013-12-02 06:46:58

解决方案2
1 2013-12-02 06:46:44