简体   繁体   English

cuda中矢量加法的分段故障

[英]Segmentation fault with vector addition in cuda

I was messing with a toy program for cuda . 我正在搞乱cuda的玩具程序。

I declare a float array transfer that to gpu and a number to each element of that float array and transfer it back to the host system and print the array. 我声明一个浮点数组传递到gpu和一个数字到该浮点数组的每个元素,并将其传回主机系统并打印数组。 However this is not working out and it is giving me segmentation fault. 然而,这没有成功,它给我分段错误。

Here's code 这是代码

#include <iostream>
using namespace std;

__global__ void kern(float *a, float *C){
    for (int i = 0; i < 3; i++) C[i] = a[i] + i;
}

int main(){
    float *A = new float[3];
    for(int i = 0; i < 3; i++){
        A[i] = i;
    }

    float * d;
    float * C;
    cudaMalloc(&C, sizeof(float)*3);
    cudaMalloc(&d, sizeof(float)*3);
    cudaMemcpy(&d, A, sizeof(float)*3, cudaMemcpyHostToDevice);
    kern<<<1, 1>>>(d, C);

    cudaMemcpy(&A, C, sizeof(float)*3, cudaMemcpyDeviceToHost);

    cout << A[2];

}

Also I am not familiar with Malloc most of my experience was with cpp and hence I am more comfortable with new datatype[]; 另外我对Malloc不熟悉我的大部分经验都是使用cpp,因此我对新的数据类型[]感觉更舒服; is there a equivalent for Cuda? Cuda还有相同的东西吗?

Change this to: 将其更改为:

cudaMemcpy(&d, A, sizeof(float)*3, cudaMemcpyHostToDevice);
cudaMemcpy(&A, C, sizeof(float)*3, cudaMemcpyDeviceToHost);

To this: 对此:

cudaMemcpy(d, A, sizeof(float)*3, cudaMemcpyHostToDevice);
cudaMemcpy(A, C, sizeof(float)*3, cudaMemcpyDeviceToHost);

Also it's always better to store return code by CUDA calls they will give you better idea what going wrong. 此外,通过CUDA调用存储返回代码总是更好,它们可以让您更好地了解出现了什么问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM