简体   繁体   English

Cuda:混合c ++和cuda代码

[英]Cuda : mix c++ and cuda code

My problem is the following: I want to add cuda code into an already existing c++ library and reuse my existing code as much as possible. 我的问题如下:我想将cuda代码添加到已经存在的c ++库中,并尽可能地重用我现有的代码。 In order to use polymorphism, I use template classes and template kernels. 为了使用多态,我使用模板类和模板内核。 As such, everything is implemented in .cpp, .h and .cuh files. 因此,所有内容都以.cpp,.h和.cuh文件实现。 No .cu file is involved, and therefore nvcc is not used and the c++ compiler chokes on the <<< >>> kernel invocation syntax. 不涉及.cu文件,因此不使用nvcc,并且c ++编译器会对<<< >>>内核调用语法进行扼流。

I have already seen [ How to separate the kernel file CUDA with the main .cpp file and [ How to call a CUDA file from a C++ header file? 我已经看到[ 如何将内核文件CUDA与主.cpp文件和[ 如何从C ++头文件中调用CUDA文件? but I cannot find any design that would solve my problem. 但我找不到任何可以解决我问题的设计。

The files involved: 涉及的文件:

main.cpp main.cpp中

Instanctate a bunch of my already existing classes, pass them to a CudaPrepare class that composes them and is responsible for preparing the data to be passed to cuda code with only primitive types. 实例化一堆我已经存在的类,将它们传递给组成它们的CudaPrepare类,并负责准备要传递给只有原始类型的cuda代码的数据。

#include "CudaPrepare.h"
#include "CudaSpecificType1.h"
#include "A.h" //already existing classes 
#include "B.h" //already existing classes

void main()
{
    A a(...);
    B b(...);
    CudaSpecificType1 cudaType(...);
    CudaPrepare<CudaSpecificType> cudaPrepare(a, b, cudaType);
    cudaPrepare.run();

}

CudaSpecificType1.cuh CudaSpecificType1.cuh

class CudaSpecificType1
{
protected:
/*
a few members
*/
public:
CudaSpecificType1(...) : /*initializations*/ {}
float polymorphicFunction(/*args*/); 
};

CudaPrepare.h CudaPrepare.h

#include "A.h" //already existing classes 
#include "B.h" //already existing classes

template<typename T>
class CudaPrepare
{
protected:
const A& a;
const B& b;
const T& t;
public:
CudaPrepare(const A& a, const B& b, const T& t): A(a), B(b), T(t) {/*some initialization stuff*/}
void run() const
{
/*
data preparation : various discretizations,  sticking to primitive type only, casting to single precision etc...
*/

CudaClass<T> cudaClass(t, /*all the prepared data here*/);
cudaClass.run();

}
};

CudaClass.cuh CudaClass.cuh

template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);

}



template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}
};

The c++ compiler gives an error at the kernel invocation as expected. c ++编译器按预期在内核调用时给出错误。 CudaClass::run() cannot be moved to a .cu file since the class is templated. CudaClass :: run()无法移动到.cu文件,因为该类是模板化的。 The only thing I can think of is to introduce a .cu file replacing main.cpp / or containing a stub that would be called from main.cpp, but then nvcc cannot handle some c++11 features. 我唯一能想到的是引入一个.cu文件替换main.cpp /或包含一个将从main.cpp调用的存根,但是后来nvcc无法处理一些c ++ 11的功能。 In particular, Ah and Bh contain a lot of enum classes... 特别是,Ah和Bh包含很多枚举类......

I experimented with Cuda 7.0 (was on 6.5 before). 我尝试使用Cuda 7.0(之前是6.5)。 Sadly, there still seems to be no support for (at least) the following c++11 features: 遗憾的是,似乎仍然没有(至少)以下c ++ 11功能的支持:

  1. enum classes 枚举类

  2. final keyword 最终关键字

  3. range based for loops 基于范围的循环

However, as suggested by Robert Crovella, explicit template instantiation solves the problem. 但是,正如Robert Crovella所建议的,显式模板实例化解决了这个问题。

CudaClass.cuh must be splitted in two: CudaClass.cuh必须分成两部分:

CudaClass.cuh CudaClass.cuh

template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);

}



template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}

void run() const;

};

CudaClass.cu CudaClass.cu

#include "CudaClass.cuh"



//explicit instantiation, so that the kernel invocation can be in a .cu file
template class CudaClass<CudaSpecificType1>;
/*
other explicit instantiations for various types
*/



template<typename T>
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM