简体   繁体   English

如何运行 cuda 合作模板 kernel

[英]How to run cuda cooperative template kernel

I am trying to unsuccessfully launch template kernel as cooperative kernel in CUDA C++, what am I doing wrong我试图启动模板 kernel 作为 CUDA ZF6F87C9IFCF1 做错了什么 B3C3F07F9 中的合作 kernel 未成功

error错误


Error       cannot determine which instance of function template "boolPrepareKernel" is intended    
 

I try to invoke kernel like below我尝试调用 kernel 如下所示

 ForBoolKernelArgs<int> fbArgs = ...;

    int device = 0;
    cudaDeviceProp deviceProp;
    cudaGetDeviceProperties(&deviceProp, device);
   cudaLaunchCooperativeKernel((void*)boolPrepareKernel, deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;

kernel is defined like kernel 定义如下

template <typename TYO>
__global__ void boolPrepareKernel(ForBoolKernelArgs<TYO> fbArgs) {
...
}

I tried parametrarize launch (in this example with int) like我尝试了参数化启动(在本例中为 int),例如

    cudaLaunchCooperativeKernel((void*)(<int>boolPrepareKernel), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;

but I get error但我得到错误

no instance of overloaded function matches the argument list            argument types are: (<error-type>, int, dim3, ForBoolKernelArgs<int>)

For suggested case对于建议的案例

cudaLaunchCooperativeKernel((void*)(boolPrepareKernel<int>), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs)

My error is我的错误是

 no instance of overloaded function matches the argument list            argument types are: (void *, int, dim3, ForBoolKernelArgs<int>)

This is probably sth simple but I am stuck - thanks for help !!这可能很简单,但我被卡住了 - 感谢您的帮助!

For reference kernel launch like供参考 kernel 像发射

boolPrepareKernel << <fFArgs.blocks, fFArgs.threads >> > (fbArgs);

works but of course grid synchronization is unavailable.工作,但当然网格同步不可用。

Here is a minimal example that will compile:这是一个将编译的最小示例:

$ cat t1954.cu
template <typename TYO>
struct ForBoolKernelArgs
{
    TYO val;
};

template <typename TYO>
__global__ void boolPrepareKernel(ForBoolKernelArgs<TYO> fbArgs) {
}


int main(){
  ForBoolKernelArgs<int> fbArgs;
  void *kernel_args[] = {&fbArgs};
  cudaLaunchCooperativeKernel((void*)(boolPrepareKernel<int>), 1, 1, kernel_args) ;
}
$ nvcc -o t1954 t1954.cu
$

Probably the main issue you had remaining is that you are not following proper instructions for passing kernel arguments.您剩下的主要问题可能是您没有遵循正确的说明来传递 kernel arguments。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM