简体   繁体   English

Cuda从设备内存创建3d纹理和cudaArray(3d)

[英]Cuda Create 3d texture and cudaArray(3d) from device memory

im trying to create a texture 3d from a part of a device array. 我试图从设备阵列的一部分创建纹理3d。

To do this, these are my steps: 为此,这些是我的步骤:

  1. malloc Device Array malloc设备阵列
  2. Write Device Array 写设备阵列
  3. Create CudaArray (3D) 创建CudaArray(3D)
  4. Bind Texture to CudaArray 将纹理绑定到CudaArray

The way im doing it it creates no compiler errors, but when i run cuda-memcheck it's failing when im trying to fetch data from the Texture. 即时通讯的方式它不会产生编译器错误,但是当我运行cuda-memcheck时,即时通讯试图从Texture中获取数据时却失败了。

Invalid global read of size 8 .. Address 0x10dfaf3a0 is out of bounds 大小为8的无效全局读取。地址0x10dfaf3a0超出范围

Thats why i'm guessing i declared the texture Array wrong. 那就是为什么我猜我宣布纹理数组错误。 here is how i access the texture: 这是我访问纹理的方式:

tex3D(NoiseTextures[i],x,y,z) tex3D(NoiseTextures [i],x,y,z)

The way im doing the steps mentioned above: 我做上面提到的步骤的方式:

1.Malloc Device Array 1.Malloc设备数组

cudaMalloc((void **)&d_Noise, sqrSizeNoise*nNoise*sizeof(float));

2.Write Device Array 2.写入设备阵列

curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
curandSetPseudoRandomGeneratorSeed(gen,Seed);
curandGenerateUniform(gen, d_Noise, sqrSizeNoise*nNoise);
curandDestroyGenerator(gen);

3+4.Creating the Cuda Array and binding it to the texture (Im guessing the mistake is here) 3 + 4。创建Cuda数组并将其绑定到纹理(我猜这里就是错误)

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();//cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray *d_cuArr;
cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoise,SizeNoise,SizeNoise), 0);
cudaMemcpy3DParms copyParams = {0};

//Loop for every separated Noise Texture (nNoise = 4)
for(int i = 0; i < nNoise; i++){

    //initialize the textures
    NoiseTextures[i] = texture<float, 3, cudaReadModeElementType>(1,cudaFilterModeLinear,cudaAddressModeWrap,channelDesc);

    //Array creation
    //+(sqrSizeNoise*i) is to separate the created Noise Array into smaller parts with the size of SizeNoise^3
    copyParams.srcPtr   = make_cudaPitchedPtr(d_Noise+(sqrSizeNoise*i), SizeNoise*sizeof(float), SizeNoise, SizeNoise);
    copyParams.dstArray = d_cuArr;
    copyParams.extent   = make_cudaExtent(SizeNoise,SizeNoise,SizeNoise);
    copyParams.kind     = cudaMemcpyDeviceToDevice;
    checkCudaErrors(cudaMemcpy3D(&copyParams));
    //Array creation End

    //new Bind
    // set texture parameters
    NoiseTextures[i].normalized = true;                      // access with normalized texture coordinates
    NoiseTextures[i].filterMode = cudaFilterModeLinear;      // linear interpolation
    NoiseTextures[i].addressMode[0] = cudaAddressModeWrap;   // wrap texture coordinates
    NoiseTextures[i].addressMode[1] = cudaAddressModeWrap;
    NoiseTextures[i].addressMode[2] = cudaAddressModeWrap;

    // bind array to 3D texture
    checkCudaErrors(cudaBindTextureToArray(NoiseTextures[i], d_cuArr, channelDesc));
    //end Bind
}
cudaFreeArray(d_cuArr);

I've Pasted this code snippet to Pastebin so its easier to look at with colors etc. http://pastebin.com/SM3dYd38 我已经粘贴此代码片段引擎收录所以它更容易看颜色等http://pastebin.com/SM3dYd38

I hope I clearly described my problem. 我希望我能清楚地描述我的问题。 If not pls comment! 如果没有请发表评论!

Can you help me with this? 你能帮我吗? Thanks for reading, 谢谢阅读,

Cery ery

Edit: Here is a complete code so you can try it on your own machine: 编辑:这是完整的代码,因此您可以在自己的计算机上尝试:

#include <helper_cuda.h>  
#include <helper_functions.h>
#include <helper_cuda_gl.h>
#include <texture_types.h>
#include <cuda_runtime.h>
#include <curand.h>

static texture<float, 3, cudaReadModeElementType> NoiseTextures[4];//texture Array
float *d_NoiseTest;//Device Array with random floats
int SizeNoiseTest = 32;
int sqrSizeNoiseTest = 32768;

void CreateTexture();

__global__ void AccesTexture(texture<float, 3, cudaReadModeElementType>* NoiseTextures)
{
        int test = tex3D(NoiseTextures[0],threadIdx.x,threadIdx.y,threadIdx.z);//by using this the error occurs
}

int
main(int argc, char **argv)
{
        CreateTexture();
}
void CreateTexture()
{
        //curand Random Generator (needs compiler link -lcurand)
        curandGenerator_t gen;
        cudaMalloc((void **)&d_NoiseTest, sqrSizeNoiseTest*4*sizeof(float));//Allocation of device Array
        curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
        curandSetPseudoRandomGeneratorSeed(gen,1234ULL);
        curandGenerateUniform(gen, d_NoiseTest, sqrSizeNoiseTest*4);//writing data to d_NoiseTest
        curandDestroyGenerator(gen);

        //cudaArray Descriptor
        cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
        //cuda Array
        cudaArray *d_cuArr;
        cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0);
        cudaMemcpy3DParms copyParams = {0};

        //Loop for every separated Noise Texture (4 = 4)
        for(int i = 0; i < 4; i++){

                //initialize the textures
                NoiseTextures[i] = texture<float, 3, cudaReadModeElementType>(1,cudaFilterModeLinear,cudaAddressModeWrap,channelDesc);

                //Array creation
                //+(sqrSizeNoise*i) is to separate the created Noise Array into smaller parts with the size of SizeNoise^3
                copyParams.srcPtr   = make_cudaPitchedPtr(d_NoiseTest+(sqrSizeNoiseTest*i), SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
                copyParams.dstArray = d_cuArr;
                copyParams.extent   = make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest);
                copyParams.kind     = cudaMemcpyDeviceToDevice;
                checkCudaErrors(cudaMemcpy3D(&copyParams));
                //Array creation End

                //new Bind
                // set texture parameters
                NoiseTextures[i].normalized = true;                      // access with normalized texture coordinates
                NoiseTextures[i].filterMode = cudaFilterModeLinear;      // linear interpolation
                NoiseTextures[i].addressMode[0] = cudaAddressModeWrap;   // wrap texture coordinates
                NoiseTextures[i].addressMode[1] = cudaAddressModeWrap;
                NoiseTextures[i].addressMode[2] = cudaAddressModeWrap;

                // bind array to 3D texture
                checkCudaErrors(cudaBindTextureToArray(NoiseTextures[i], d_cuArr, channelDesc));
                //end Bind
        }
        cudaFreeArray(d_cuArr);

        AccesTexture<<<1,dim3(4,4,4)>>>(NoiseTextures);
}

You need to link -lcurand though. 但是,您需要链接-lcurand。 And include CUDA-6.0/samples/common/inc 并包括CUDA-6.0 / samples / common / inc

Im now getting a different error in this code 我现在在此代码中遇到另一个错误

code=11(cudaErrorInvalidValue) "cudaMemcpy3D(&copyParams)" code = 11(cudaErrorInvalidValue)“ cudaMemcpy3D(&copyParams)”

Even though it's the exact same code then my original. 即使是与我原来的代码完全相同的代码。 - Im starting to get completely confused. -我开始变得完全困惑。 Thank you for your help 谢谢您的帮助

Here's a worked example showing the creation of an array of texture objects, roughly following the path of the code you provided. 这是一个工作示例,显示了创建纹理对象数组的过程,大致遵循您提供的代码的路径。 You can see, by comparing to the texture reference code I placed here , that the first set of texture reads from the first texture object (ie the first kernel call) are the same numerical values as the set of reads from the texture reference example (you may need to adjust the grid size of the two example codes to match). 通过与我在此处放置的纹理参考代码进行比较,您可以看到,从第一个纹理对象读取的第一组纹理(即第一个内核调用)与从纹理参考示例读取的一组纹理具有相同的数值(您可能需要调整两个示例代码的网格大小以匹配)。

Texture object usage requires compute capability 3.0 or higher. 使用纹理对象需要计算能力3.0或更高。

example: 例:

$ cat t507.cu
#include <helper_cuda.h>
#include <curand.h>
#define NUM_TEX 4

const int SizeNoiseTest = 32;
const int cubeSizeNoiseTest = SizeNoiseTest*SizeNoiseTest*SizeNoiseTest;
static cudaTextureObject_t texNoise[NUM_TEX];

__global__ void AccesTexture(cudaTextureObject_t my_tex)
{
        float test = tex3D<float>(my_tex,(float)threadIdx.x,(float)threadIdx.y,(float)threadIdx.z);//by using this the error occurs
        printf("thread: %d,%d,%d, value: %f\n", threadIdx.x, threadIdx.y, threadIdx.z, test);
}

void CreateTexture()
{

    float *d_NoiseTest;//Device Array with random floats
    cudaMalloc((void **)&d_NoiseTest, cubeSizeNoiseTest*sizeof(float));//Allocation of device Array
    for (int i = 0; i < NUM_TEX; i++){
        //curand Random Generator (needs compiler link -lcurand)
        curandGenerator_t gen;
        curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
        curandSetPseudoRandomGeneratorSeed(gen,1235ULL+i);
        curandGenerateUniform(gen, d_NoiseTest, cubeSizeNoiseTest);//writing data to d_NoiseTest
        curandDestroyGenerator(gen);

        //cudaArray Descriptor
        cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
        //cuda Array
        cudaArray *d_cuArr;
        checkCudaErrors(cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0));
        cudaMemcpy3DParms copyParams = {0};


        //Array creation
        copyParams.srcPtr   = make_cudaPitchedPtr(d_NoiseTest, SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
        copyParams.dstArray = d_cuArr;
        copyParams.extent   = make_cudaExtent(SizeNoiseTest,SizeNoiseTest,SizeNoiseTest);
        copyParams.kind     = cudaMemcpyDeviceToDevice;
        checkCudaErrors(cudaMemcpy3D(&copyParams));
        //Array creation End

        cudaResourceDesc    texRes;
        memset(&texRes, 0, sizeof(cudaResourceDesc));
        texRes.resType = cudaResourceTypeArray;
        texRes.res.array.array  = d_cuArr;
        cudaTextureDesc     texDescr;
        memset(&texDescr, 0, sizeof(cudaTextureDesc));
        texDescr.normalizedCoords = false;
        texDescr.filterMode = cudaFilterModeLinear;
        texDescr.addressMode[0] = cudaAddressModeClamp;   // clamp
        texDescr.addressMode[1] = cudaAddressModeClamp;
        texDescr.addressMode[2] = cudaAddressModeClamp;
        texDescr.readMode = cudaReadModeElementType;
        checkCudaErrors(cudaCreateTextureObject(&texNoise[i], &texRes, &texDescr, NULL));}
}

int main(int argc, char **argv)
{
        CreateTexture();
        AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[0]);
        AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[1]);
        AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[2]);
        checkCudaErrors(cudaPeekAtLastError());
        checkCudaErrors(cudaDeviceSynchronize());
        return 0;
}

compile with: 编译:

$ nvcc -arch=sm_30 -I/shared/apps/cuda/CUDA-v6.0.37/samples/common/inc -lcurand -o t507 t507.cu

output: 输出:

$ cuda-memcheck ./t507
========= CUDA-MEMCHECK
thread: 0,0,0, value: 0.310691
thread: 1,0,0, value: 0.627906
thread: 0,1,0, value: 0.638900
thread: 1,1,0, value: 0.665186
thread: 0,0,1, value: 0.167465
thread: 1,0,1, value: 0.565227
thread: 0,1,1, value: 0.397606
thread: 1,1,1, value: 0.503013
thread: 0,0,0, value: 0.809163
thread: 1,0,0, value: 0.795669
thread: 0,1,0, value: 0.808565
thread: 1,1,0, value: 0.847564
thread: 0,0,1, value: 0.853998
thread: 1,0,1, value: 0.688446
thread: 0,1,1, value: 0.733255
thread: 1,1,1, value: 0.649379
thread: 0,0,0, value: 0.040824
thread: 1,0,0, value: 0.087417
thread: 0,1,0, value: 0.301392
thread: 1,1,0, value: 0.298669
thread: 0,0,1, value: 0.161962
thread: 1,0,1, value: 0.316443
thread: 0,1,1, value: 0.452077
thread: 1,1,1, value: 0.477722
========= ERROR SUMMARY: 0 errors

In this case I'm using the same kernel, called multiple times, to read from the individual texture objects. 在这种情况下,我使用同一内核(多次调用)从单个纹理对象读取。 It should be possible to pass multiple objects to the same kernel, however it is not advisable to have a single warp read from multiple textures, if that can be avoided in your code. 应该可以将多个对象传递到同一个内核,但是,如果可以避免在代码中使用单个warp,则不建议从多个纹理中读取单个warp The actual issue resides at the quad level, which I'd prefer not to get into. 实际问题位于四级,我不希望涉及。 It's best if you can arrange your code so that a warp is reading from the same texture object, on any given cycle. 最好安排代码,以便在任何给定的周期内从相同的纹理对象读取扭曲。

Note that for simplicity of presentation, this CreateTexture() function overwrites previously allocated device pointers such as d_cuArr , during the processing of the loop. 请注意,为简化表示,此CreateTexture()函数在循环处理期间会覆盖先前分配的设备指针,例如d_cuArr This isn't illegal or a functional issue, but it raises the possibility of memory leaks. 这不是非法的或功能性的问题,但它增加了内存泄漏的可能性。

I assume you can modify the code to handle deallocation of those if this is a concern. 我认为如果您担心的话,您可以修改代码来处理那些代码的释放。 The purpose of this code is to demonstrate the method to get things working. 这段代码的目的是演示使事情正常进行的方法。

在cudaMalloc3DArray中,应该像这样make_cudaExtent(SizeNoiseTest,SizeNoiseTest,SizeNoiseTest)而不是make_cudaExtent(SizeNoiseTest * sizeof(float),SizeNoiseTest,SizeNoiseTest)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM