将内在函数作为模板参数传递

Question

I'm trying to passing atomicAdd function into another function as template parameter. 我试图将atomicAdd函数传递给另一个函数作为模板参数。

Here is my Kernel1: 这是我的内核1：

template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
    atomicFunc(address, 1);
}

Try 1: 尝试1：

myfunc1<<<1,1>>>(val.dev_ptr, atomicAdd);

It does not work due to the compiler cannot match the expected function signature. 由于编译器无法匹配预期的函数签名，因此无法正常工作。

Try 2: Firstly, I wrap the atomicAdd into a custom function called MyAtomicAdd. 尝试2：首先，我将atomicAdd包装到一个名为MyAtomicAdd的自定义函数中。

template<typename T>
__device__ void MyAtomicAdd(T *address, T val) {
    atomicAdd(address, val);
}

Then, I defined a function pointer called "TAtomic" and declare the TAtomic as template parameter. 然后，我定义了一个称为“ TAtomic”的函数指针，并将TAtomic声明为模板参数。

typedef void (*TAtomic)(float *,float);

template<typename T, TAtomic atomicFunc>
__global__ void myfunc2(T *address) {
    atomicFunc(address, 1);
}

myfunc2<float, MyAtomicAdd><<<1,1>>>(dev_ptr);
CUDA_CHECK(cudaDeviceSynchronize());

Actually, Try 2 works. 实际上，尝试2件作品。 But, I don't want to use typedef. 但是，我不想使用typedef。 I need something more generic. 我需要一些更通用的东西。

Try 3: Just passing MyAtomicAdd to myfunc1. 尝试3：只需将MyAtomicAdd传递给myfunc1。

myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<float>);
CUDA_CHECK(cudaDeviceSynchronize());

The compiler can compile the code. 编译器可以编译代码。 But when I run the program, a error reported: 但是，当我运行该程序时，报告了一个错误：

"ERROR in /home/liang/groute-dev/samples/framework/pagerank.cu:70: invalid program counter (76)"

I just wondering, why try 3 doesn't work? 我只是想知道，为什么尝试3不起作用？ And any simple or gentle way exists to implement this requirement? 是否存在任何简单或温和的方式来实现此要求？ Thank you. 谢谢。

Answer 1

Try 3 doesn't work because you are attempting to take the address of a __device__ function in host code, which is illegal in CUDA: 尝试3不起作用，因为您尝试使用主机代码中的__device__函数的地址，这在CUDA中是非法的：

myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<float>);
                          ^
                          effectively a function pointer - address of a __device__ function

Such usage attempts in CUDA will resolve to some sort of an "address" - but it is garbage, so when you try to use it as an actual function entry point in device code, you get the error you encountered: invalid program counter (or in some cases, just illegal address ). CUDA中的此类使用尝试将解析为某种“地址”-但这是垃圾，因此当您尝试将其用作设备代码中的实际功能入口点时，会遇到以下错误： invalid program counter （或在某些情况下，只是illegal address ）。

You can make your Try 3 method work (without a typedef ) by wrapping the intrinsic in a functor instead of a bare __device__ function: 您可以通过将内在函数包装在函子中，而不要使用裸露的__device__函数来使Try 3方法工作（没有typedef ）：

$ cat t48.cu
#include <stdio.h>

template<typename T>
__device__ void MyAtomicAdd(T *address, T val) {
    atomicAdd(address, val);
}


template <typename T>
struct myatomicadd
{
  __device__ T operator()(T *addr, T val){
    return atomicAdd(addr, val);
  }
};

template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
    atomicFunc(address, (T)1);
}


int main(){

  int *dev_ptr;
  cudaMalloc(&dev_ptr, sizeof(int));
  cudaMemset(dev_ptr, 0, sizeof(int));
//  myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<int>);
  myfunc1<<<1,1>>>(dev_ptr, myatomicadd<int>());
  int h = 0;
  cudaMemcpy(&h, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);
  printf("h = %d\n", h);
  return 0;
}
$ nvcc -arch=sm_35 -o t48 t48.cu
$ cuda-memcheck ./t48
========= CUDA-MEMCHECK
h = 1
========= ERROR SUMMARY: 0 errors
$

We can realize a slightly simpler version of this as well, letting the functor template type be inferred from the kernel template type: 我们也可以实现一个稍微简单的版本，让我们可以从内核模板类型推断出functor模板类型：

$ cat t48.cu
#include <stdio.h>

struct myatomicadd
{
template <typename T>
  __device__ T operator()(T *addr, T val){
    return atomicAdd(addr, val);
  }
};

template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
    atomicFunc(address, (T)1);
}


int main(){

  int *dev_ptr;
  cudaMalloc(&dev_ptr, sizeof(int));
  cudaMemset(dev_ptr, 0, sizeof(int));
  myfunc1<<<1,1>>>(dev_ptr, myatomicadd());
  int h = 0;
  cudaMemcpy(&h, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);
  printf("h = %d\n", h);
  float *dev_ptrf;
  cudaMalloc(&dev_ptrf, sizeof(float));
  cudaMemset(dev_ptrf, 0, sizeof(float));
  myfunc1<<<1,1>>>(dev_ptrf, myatomicadd());
  float hf = 0;
  cudaMemcpy(&hf, dev_ptrf, sizeof(float), cudaMemcpyDeviceToHost);
  printf("hf = %f\n", hf);
  return 0;
}
$ nvcc -arch=sm_35 -o t48 t48.cu
$ cuda-memcheck ./t48
========= CUDA-MEMCHECK
h = 1
hf = 1.000000
========= ERROR SUMMARY: 0 errors
$

More treatments of the use of device function pointers in CUDA are linked to this answer . 与在CUDA中使用设备功能指针的更多方式相关的答案。

将内在函数作为模板参数传递

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-19 19:58:17

将内在函数作为模板参数传递

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-19 19:58:17

解决方案1
1 已采纳 2018-02-19 19:58:17