简体   繁体   中英

Passing arrays/pointers as template parameters

I'm trying to create a template function of the following sort:

template <bool isHorizontal, float* kernel>
__global__ void smoothFilterColumns(const TwImageCUDA_Device* source, TwImageCUDA_Device* 
destination)
{
// code...
}

(Don't worry about the __global__ ; this is being implemented on CUDA.)

Unfortunately, it won't allow me to create instances of this function like this:

float ptrKernel[] = {1, 2, 1};
smoothFilterColumns<true, ptrKernel>(dxBuffer->cuda_image, dxOutput->cuda_image);

I've tried all sorts of float* s and float[] s, with and without the const modifier. Is it even possible to create a template of this sort?

Thanks in advance.

NB. The kernel is being passed as a template parameter and not a normal function parameter because that allows me to create more efficient code in CUDA by unrolling loops.

Update Pointers to floats work as template parameters with standard C++, but apparently there's no way to get them to work with CUDA device functions, since they expect pointers to device addresses and one cannot define those externally. If anyone got that to work, please let me know.

I doubt you will get that to work. As others point out, the C++ standard says that any object or function passed as a template parameter must have external linkage (so not defined at the scope of the current translation unit). The problem is that CUDA doesn't current support external linkage at all - every symbol used in device code must have internal linkage (ie. defined within the same translation unit). The underlying reason for this restriction is that CUDA doesn't currently have a linker for device code.

Please make sure ptrKernel has external linkage.

// static float ptrKernel[] = { ... };
// ^ won't work.

// const float ptrKernel[] = { ... };
// ^ won't work.

float ptrKernel[] = { ... };
// ^ ok.

void func() {
   // float ptrKernel[] = { ... };
   // ^ won't work (not global variable).
   ...
}

This is a restriction of non-type template, as described in §[temp.arg.nontype]/1:

A template-argument for a non-type, non-template template-parameter shall be one of:

  • an integral constant-expression of integral or enumeration type; or
  • the name of a non-type template-parameter ; or
  • the name of an object or function with external linkage , including function templates and function template-id 's but excluding non-static class members, expressed as id-expression ; or
  • the address of an object or function with external linkage , including function templates and function template-id 's but excluding non-static class members, expressed as & id-expression where the & is optional if the name refers to a function or array; or
  • a pointer to member expressed as described in 5.3.1 .

I guess the ptrKernel variable you are passing as the template argument is a local variable. Anyway, there is a restriction on what you can pass as a non-type template argument. According to the C++ standard (14.3.2), the following are allowed:

  • integral constant expression of integral or enumeration type
  • name of a non-type template parameter
  • name of an object or function with external linkage
  • address of an object or function with external linkage
  • pointer to member

Make sure the ptrKernel variable meets these requirements (again, my guess is that it is not a variable with external linkage, ie global or static class member).

Will this work in CUDA?

template <bool isHorizontal, class Kernel>
__global__ void smoothFilterColumns(
    const TwImageCUDA_Device* source, TwImageCUDA_Device* destination)
{
    const float *kernel = Kernel::ptr();
    // code...
}

struct Kernel_1_2_1
{
    static const float *ptr()
    {
        static const float kernel[] = {1, 2, 1};
        return kernel;
    }
}

smoothFilterColumns<true, Kernel_1_2_1>(
    dxBuffer->cuda_image, dxOutput->cuda_image);

You might be able to make kernel a data member of the struct . And you might want to add a mechanism to pass the kernel size.

Is not gonna work. You are trying to pass a CPU-RAM pointer to a GPU-RAM kernel.

You can do in different ways 1) is to embed all the constant values using multiple templates depending on the different lengths of your kernels, or otherwise you create a functor class that handles the detail of the transformation that you want to apply:

Here is a working example to let you understand. Don't forget the device specifier.

// with 3 int
template<int amount, int k0,int k1, int k2>
__global__ void apply_kernel(const float *input, float *output){


}

// with four int
template<int amount, int k0,int k1, int k2, int k3>
__global__ void apply_kernel(const float *input, float *output){


}

// with five int 
template<int amount, int k0,int k1, int k2, int k3, int k4>
__global__ void apply_kernel(const float *input, float *output){


}

class KernelOperator {
public:
      __host__ __device__ KernelOperator() {
      }
      __host__ __device__ int operator*(int value){
            return value * 2;
      }
};


// with KernelOperator
template<class T>
__global__ void apply_kernel(const float *input, float *output){
           T value;

}

int main(){
    apply_kernel<0, 1,2,1><<<10, 20>>>(NULL,NULL);

    apply_kernel< KernelOperator ><<<10, 20>>>(NULL,NULL);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM