简体   繁体   English

CUDA __host__ __device__变量

[英]CUDA __host__ __device__ variables

In CUDA function type qualifiers __device__ and __host__ can be used together in which case the function is compiled for both the host and the device. 在CUDA中,函数类型限定符__device____host__可以一起使用,在这种情况下,将为主机和设备编译函数。 This allows to eliminate copy-paste. 这允许消除复制粘贴。 However, there is no such thing as __host__ __device__ variable. 但是,没有__host__ __device__变量这样的东西。 I'm looking for an elegant way to do something like this: 我正在寻找一种优雅的方式来做这样的事情:

__host__ __device__ const double common = 1.0;

__host__ __device__ void foo() {
    ... access common
}

__host__ __device__ void bar() {
    ... access common
}

I found out that the following code complies and runs without errors. 我发现以下代码符合并运行没有错误。 (all results were obtained on Ubuntu 14.04 with CUDA 7.5 and gcc 4.8.4 as a host compiler) (所有结果均在Ubuntu 14.04上获得,CUDA 7.5和gcc 4.8.4作为主编译器)

#include <iostream>

__device__ const double off = 1.0;

__host__ __device__ double sum(int a, int b) {
    return a + b + off;
}

int main() {
    double res = sum(1, 2);
    std::cout << res << std::endl;
    cudaDeviceReset();
    return 0;
}

$ nvcc main.cu -o main && ./main
4

In fact, nvcc --cuda main.cu translates cu-file into this: 事实上, nvcc --cuda main.cu将cu文件转换为:

...
static const double off = (1.0);
# 5 "main.cu"
double sum(int a, int b) {
# 6 "main.cu"
return (a + b) + off;
# 7 "main.cu"
}
# 9 "main.cu"
int main() {
# 10 "main.cu"
double res = sum(1, 2);
# 11 "main.cu"
(((std::cout << res)) << (std::endl));
# 12 "main.cu"
cudaDeviceReset();
# 13 "main.cu"
return 0;
# 14 "main.cu"
}
...

But, no surprise, if the variable off is declared without const qualifier ( __device__ double off = 1.0 ) I get the following output: 但是,毫不奇怪,如果声明变量off没有const限定符( __device__ double off = 1.0 ),我得到以下输出:

$ nvcc main.cu -o main && ./main
main.cu(7): warning: a __device__ variable "off" cannot be directly read in a host function

3

So, returning back to the original question, can I rely on this behavior with global __device__ const variable? 那么,回到最初的问题,我可以依赖于全局__device__ const变量的这种行为吗? If not, what are the other options? 如果没有,还有其他选择吗?

UPD By the way, the above behavior doesn't reproduce on Windows. UPD顺便说一下,上面的行为不会在Windows上重现。

For ordinary floating point or integral types it should be sufficient simply to mark the variable as const at global scope: 对于普通浮点或整数类型,仅将变量标记为全局范围的const就足够了:

const double common = 1.0;

It should then be usable in any subsequent function, whether host, __host__ , __device__ , or __global__ . 然后它应该可用于任何后续函数,无论是host, __device__ host __host____device__ __host__还是__global__

This is supported in the documentation here , subject to various restrictions: 这里的文档支持这一点,但受到各种限制:

Let 'V' denote a namespace scope variable or a class static member variable that has const qualified type and does not have execution space annotations (eg, __device__ , __constant__ , __shared__ ). 设'V'表示命名空间范围变量或具有const限定类型且没有执行空间注释的类静态成员变量(例如, __constant__ __shared__ __device____constant__ __shared__ __device____constant__ __shared__ )。 V is considered to be a host code variable. V被认为是主机代码变量。

The value of V may be directly used in device code, if V has been initialized with a constant expression before the point of use, and it has one of the following types: 如果V在使用点之前使用常量表达式初始化,则V的值可以直接在设备代码中使用,并且它具有以下类型之一:

  • builtin floating point type except when the Microsoft compiler is used as the host compiler, 内置浮点类型,除非Microsoft编译器用作主机编译器,
  • builtin integral type. 内置整体式。

Device source code cannot contain a reference to V or take the address of V. 设备源代码不能包含对V的引用或取V的地址。

In other cases, some possible options are: 在其他情况下,一些可能的选择是:

  1. Use a compiler macro defined constant: 使用编译器宏定义的常量:

     #define COMMON 1.0 
  2. Use templating, if the range of choices on the variable is discrete and limited. 如果变量的选择范围是离散且有限的,则使用模板。

  3. For other options/cases, it may be necessary to manage explicit host and device copies of the variable, eg using __constant__ memory on the device, and a corresponding copy on the host. 对于其他选项/案例,可能需要管理变量的显式主机和设备副本,例如,在设备上使用__constant__内存,以及在主机上使用相应的副本。 Host and device paths within the __host__ __device__ function that accesses the variable could then differentiate behavior based on a nvcc compiler macro (eg #ifdef __CUDA_ARCH__ ... 访问变量的__host__ __device__函数中的主机和设备路径可以根据nvcc编译器宏区分行为(例如#ifdef __CUDA_ARCH__ ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM