CUDA-在gpu上生成数组并使用内核对其进行修改

Question

in this code im generating 1D array of floats on a gpu using CUDA. 在此代码中，im使用CUDA在gpu上生成了一维浮点数组。 The numbers are between 0 and 1. For my purpose i need them to be between -1 and 1 so i have made simple kernel to multiply each element by 2 and then substract 1 from it. 数字在0和1之间。出于我的目的，我需要将它们在-1和1之间，因此我制作了简单的内核，将每个元素乘以2，然后从中减去1。 However something is going wrong here. 但是这里出了点问题。 When i print my original array into .bmp i get this http://i.imgur.com/IS5dvSq.png (typical noise pattern). 当我将原始数组打印到.bmp文件中时，我得到了这个http://i.imgur.com/IS5dvSq.png （典型的噪声模式）。 But when i try to modify that array with my kernel i get blank black picture http://imgur.com/cwTVPTG . 但是，当我尝试使用内核修改该数组时，我得到了空白的黑色图片http://imgur.com/cwTVPTG 。 The program is executable but in the debug i get this: 该程序是可执行的，但在调试中我得到了：

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: cudaError_enum at memory location 0x003cfacc.. Midpoint_CUDA_Alpha.exe中0x75f0c41f的首次机会异常：Microsoft C ++异常：内存位置0x003cfacc的cudaError_enum。

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: cudaError_enum at memory location 0x003cfb08.. Midpoint_CUDA_Alpha.exe中0x75f0c41f的首次机会异常：Microsoft C ++异常：内存位置0x003cfb08处的cudaError_enum。

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: [rethrow] at memory location 0x00000000.. Midpoint_CUDA_Alpha.exe中0x75f0c41f的第一个机会异常：Microsoft C ++异常：内存位置0x00000000的[rethrow]。

i would be thankfull for any help or even little hint in this matter. 对于这个问题的任何帮助或什至一点点暗示，我将不胜感激。 Thanks ! 谢谢！ (edited) （编辑）的

#include <device_functions.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include "stdafx.h"
#include "EasyBMP.h"
#include <curand.h> //curand.lib must be added in project propetties > linker > input
#include "device_launch_parameters.h"

float *heightMap_cpu;
float *randomArray_gpu;
int randCount = 0;
int rozmer = 513;

void createRandoms(int size){
    curandGenerator_t generator;
    cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
    curandCreateGenerator(&generator,CURAND_RNG_PSEUDO_XORWOW);
    curandSetPseudoRandomGeneratorSeed(generator,(int)time(NULL));
    curandGenerateUniform(generator,randomArray_gpu,size*size);
}

__global__ void polarizeRandoms(int size, float *randomArray_gpu){
    int index = threadIdx.x + blockDim.x * blockIdx.x;
    if(index<size*size){
        randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f;
    }
}

//helper fucnction for getting address in 1D using 2D coords
int ad(int x,int y){
    return x*rozmer+y;
}

void printBmp(){
    BMP AnImage;
    AnImage.SetSize(rozmer,rozmer);
    AnImage.SetBitDepth(24);
    int i,j;
    for(i=0;i<=rozmer-1;i++){
        for(j=0;j<=rozmer-1;j++){
            AnImage(i,j)->Red = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Green = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Blue = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Alpha = 0;
        }
    }
    AnImage.WriteToFile("HeightMap.bmp");
}

int main(){
    createRandoms(rozmer);
    polarizeRandoms<<<((rozmer*rozmer)/1024)+1,1024>>>(rozmer,randomArray_gpu);
    heightMap_cpu = (float*)malloc((rozmer*rozmer)*sizeof(float));
    cudaMemcpy(heightMap_cpu,randomArray_gpu,rozmer*rozmer*sizeof(float),cudaMemcpyDeviceToHost);
    printBmp();

    //cleanup
    cudaFree(randomArray_gpu);
    free(heightMap_cpu);
    return 0;
}

Answer 1

This is wrong: 这是错误的：

cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));

We don't use cudaMalloc with __device__ variables. 我们不将cudaMalloc与__device__变量一起使用。 If you do proper cuda error checking I'm pretty sure that line will throw an error. 如果您进行正确的cuda 错误检查，我很确定该行将引发错误。

If you really want to use a __device__ pointer this way, you need to create a separate normal pointer, cudaMalloc that, then copy the pointer value to the device pointer using cudaMemcpyToSymbol : 如果您确实想以这种方式使用__device__指针，则需要创建一个单独的普通指针cudaMalloc ，然后使用cudaMemcpyToSymbol将指针值复制到设备指针：

float *my_dev_pointer;
cudaMalloc((void**)&my_dev_pointer, size*size*sizeof(float));
cudaMemcpyToSymbol(randomArray_gpu, &my_dev_pointer, sizeof(float *));

Whenever you are having trouble with your CUDA programs, you should do proper cuda error checking. 每当您遇到CUDA程序问题时，都应进行正确的cuda错误检查。 It will likely focus your attention on what is wrong. 它可能会将您的注意力集中在错误的地方。

And, yes, kernels can access __device__ variables without the variable being passed explicitly as a parameter to the kernel. 而且，是的，内核可以访问__device__变量，而无需将该变量作为参数显式传递给内核。

The programming guide covers the proper usage of __device__ variables and the api functions that should be used to access them from the host. 编程指南介绍了__device__变量的正确用法以及应从主机访问它们的api函数。

CUDA-在gpu上生成数组并使用内核对其进行修改

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-09-11 05:03:34

CUDA-在gpu上生成数组并使用内核对其进行修改

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-09-11 05:03:34

解决方案1
3 已采纳 2013-09-11 05:03:34