简体   繁体   English

Cuda global __device__变量自动初始化

[英]Cuda global __device__ variable auto initialization

I'm declaring a global variable myvar on the device using the __device__ specifier. 我正在使用__device__说明符在设备上声明一个全局变量myvar I don't set it to a meaningful value anywhere (not using cudaMemcpyToSymbol in my kernel launch method, as you would normally do). 我没有在任何地方将它设置为有意义的值(在我的内核启动方法中不使用cudaMemcpyToSymbol,就像通常那样)。

I'd expect the value of myvar to be random garbage, but it's neatly 0.0 every time. 我希望myvar的值是随机垃圾,但每次都是0.0。 Does CUDA do auto-initialisation of device variables? CUDA会对设备变量进行自动初始化吗?

I've checked it using the CUDA debugger also, the value is effectively 0. 我也使用CUDA调试器检查了它,该值实际为0。

__device__ float myvar;

__global__ void kernel(){
    printf("my var: %f", myvar);
}

int kernel_launch(){
    kernel<<<1,5>>>();
    cudaDeviceSynchronize();
   return 0;
}

CUDA does not automatically initialize any variables. CUDA不会自动初始化任何变量。 It's just a CUDA implementation based coincidence that myvar becomes zero in your test app. 这只是一个基于CUDA实现的巧合, myvar在你的测试应用程序中变为零。

In IEEE-754 floating point (used by NVIDIA GPUs), an all zero pattern corresponds to 0.0, so it's a much more likely "random" value than, say, 1.0f. 在IEEE-754浮点(由NVIDIA GPU使用)中,全零模式对应于0.0,因此它比“1.0f”更可能是“随机”值。

Don't infer the values of all your GPU memory based on the value in that single word... 不要根据单个单词中的值来推断所有GPU内存的值...

I did a small experiment and was slightly surprised by the result though. 我做了一个小实验,虽然结果略显惊讶。 I initialized myvar with __device__ float myvar(1.1f); 我用__device__ float myvar(1.1f);初始化了myvar __device__ float myvar(1.1f); and altered the printf() so that it prints both the value and the address of the variable. 并更改了printf()以便它打印变量的值和地址。 Then I ran it, got 1.1f output and noted the address. 然后我运行它,获得1.1f输出并记下地址。 Then I removed the initialization and ran it again. 然后我删除了初始化并再次运行它。 This time, the value went back to 0.0f while the address stayed the same, showing that the chunk of memory in which this variable is located does get zeroed out as part of regular CUDA operations. 这一次,当地址保持不变时,该值返回到0.0f ,表明该变量所在的内存块确实在常规CUDA操作中被清零。 For instance, this could happen if the CUDA program is copied to the GPU within a fixed size chunk in which the other data is zero, and myvar is assigned to an address within this chunk. 例如,如果将CUDA程序复制到其他数据为零的固定大小的块中的GPU,并且myvar被分配给该块中的地址,则可能发生这种情况。

__device__ uninitialized variables, much like their global __host__ counterpart, need to be declared in the executable by their size and location in memory. __device__未初始化的变量,就像它们的全局__host__对应变量一样,需要通过它们在内存中的大小和位置在可执行文件中声明。 As far as I know, such declarations always need a placeholder value, which unsurprisingly appears to be zero. 据我所知,这样的声明总是需要一个占位符值,不出所料,这似乎是零。

This can be checked readily. 这可以很容易地检查。 For example this command disassembles the output of a simple __device__ int a; 例如,这个命令反汇编一个简单的__device__ int a;的输出__device__ int a; declaration: 宣言:

nvcc -o test.o -c -x cu - <<< "__device__ int a;" && cuobjdump -xelf all test.o && nvdisasm *cubin

You'll get the following output: 您将获得以下输出:

    .headerflags    @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"


//--------------------- .nv.constant14            --------------------------
    .section    .nv.constant14,"a",@progbits
    .align  4
    .align      8
.nv.constant14:
        /*0000*/    .dword  a


//--------------------- .nv.global                --------------------------
    .section    .nv.global,"aw",@nobits
    .align  4
    .type       a,@object
    .size       a,(.L_1 - a)
a:
.nv.global:
    .zero       4
.L_1:

where you can clearly see the implicit zero initialization. 在那里你可以清楚地看到隐含的零初始化。

However, I believe it would be unsafe to rely on this. 但是,我认为依靠这一点是不安全的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM