What's the best way of encapsulating CUDA kernels?

Question

I'm trying to make a CUDA project getting the closest to an OO design as possible. In the moment, the solution that I found is by using a Struct to encapsulate the data and for each method that needs some GPU processing, the implementation of 3 functions are necessary:

The method that will be called by the object.
A __ global __ function that will call a __ device __ method of that struct.
A __ device __ method inside the struct.

I will give you an example. Lets say I need to implement a method to initialize a buffer inside a struct. It would looks like something like that:

struct Foo
{
   float *buffer;
   short2 buffer_resolution_;
   short2 block_size_;
   __device__ initBuffer()
   {
      int x = blockIdx.x * blockDim.x + threadIdx.x;
      int y = blockIdx.y * blockDim.y + threadIdx.y;
      int plain_index = (y * buffer_resolution.x) + x;
      if(plain_index < buffer_size)
         buffer[plain_index] = 0;
   }
   void init(const short2 &buffer_resolution, const short2 &block_size)
   {
       buffer_resolution_ = buffer_resolution;
       block_size_ = block_size;
       //EDIT1 - Added the cudaMalloc
       cudaMalloc((void **)&buffer_, buffer_resolution.x * buffer_resolution.y);
       dim3 threadsPerBlock(block_size.x, block_size.y);
       dim3 blocksPerGrid(buffer_resolution.x/threadsPerBlock.x, buffer_resolution.y/threadsPerBlock.y)
       initFooKernel<<<blocksPerGrid, threadsPerBlock>>>(this);
   }
}

__global__ initFooKernel(Foo *foo)
{
   foo->initBuffer();
}

I need to do that because looks like that I cant declare a __ global __ inside the struct. I've learned this way by looking at some opensource projects, but looks a lot troublesome to implement THREE functions to implement every encapsulated GPU method. So, my question is: Is that the best/only approach possible? Is that even a VALID aproach?

EDIT1: I forgot to put the cudaMalloc to allocate the buffer before calling initFooKernel. Fixed it.

Answer 1

Is the goal to make classes that use CUDA while they look like normal classes from the outside?

If so, to expand on what O'Conbhui was saying, you can just create C style calls for the CUDA functionality and then create a class that wraps those calls.

So, in a .cu file, you would put definitions for texture references, kernels, C style functions that call the kernels and C style functions that allocate and free GPU memory. In your example, this would include a function that calls a kernel that initializes GPU memory.

Then, in a corresponding .cpp file, you import a header with declarations for the functions in the .cu file and you define your class. In the constructor, you call the .cu function that allocates CUDA memory and sets up other CUDA resources, such as textures, including your own memory initialization function. In the destructor, you call the functions that free the CUDA resources. In your member functions, you call the functions that call kernels.

What's the best way of encapsulating CUDA kernels?

Question

1 answers

solution1
3 ACCPTED 2012-04-16 15:45:07

What's the best way of encapsulating CUDA kernels?

Question

1 answers

solution1 3 ACCPTED 2012-04-16 15:45:07

solution1
3 ACCPTED 2012-04-16 15:45:07