[英]Why does my kernel's shared memory seems to be initialized to zero?
As was mentioned in this Shared Memory Array Default Value question, shared memory is non-initialized, ie can contain any value.正如在这个共享内存数组默认值问题中提到的,共享内存是非初始化的,即可以包含任何值。
#include <stdio.h>
#define BLOCK_SIZE 512
__global__ void scan(float *input, float *output, int len) {
__shared__ int data[BLOCK_SIZE];
// DEBUG
if (threadIdx.x == 0 && blockIdx.x == 0)
{
printf("Block Number: %d\n", blockIdx.x);
for (int i = 0; i < BLOCK_SIZE; ++i)
{
printf("DATA[%d] = %d\n", i, data[i]);
}
}
}
int main(int argc, char ** argv) {
dim3 block(BLOCK_SIZE, 1, 1);
dim3 grid(10, 1, 1);
scan<<<grid,block>>>(NULL, NULL, NULL);
cudaDeviceSynchronize();
return 0;
}
But why in this code it is not true and I'm constantly getting zeroed shared memory?但是为什么在这段代码中它不是真的,而且我不断地将共享内存归零?
DATA[0] = 0
DATA[1] = 0
DATA[2] = 0
DATA[3] = 0
DATA[4] = 0
DATA[5] = 0
DATA[6] = 0
...
I tested with Release and Debug Mode : "-O3 -arch=sm_20", "-O3 -arch=sm_30" and "-arch=sm_30".我使用发布和调试模式进行了测试:“-O3 -arch=sm_20”、“-O3 -arch=sm_30”和“-arch=sm_30”。 The result is always the same.
结果总是一样的。
I think your conjecture of shared memory initialized to 0
is questionable.我认为您对共享内存初始化为
0
猜想是有问题的。 Try the following code, which is a slight modification of yours.试试下面的代码,这是对你的稍微修改。 Here, I'm calling the kernel twice and altering the values of the
data
array.在这里,我调用内核两次并更改
data
数组的值。 The first time the kernel is launched, the "uninitialized" values of data
will be all 0
's.第一次内核启动,对“未初始化”值
data
将是所有0
的。 The second time the kernel is launched, the "uninitialized" values of data
will be all different from 0
's.内核启动第二次的“未初始化”值
data
将来自不同0
的。
I think this depends on the fact that shared memory is SRAM, which exhibits data remanence .我认为这取决于共享内存是 SRAM,它表现出数据剩余的事实。
#include <stdio.h>
#define BLOCK_SIZE 32
__global__ void scan(float *input, float *output, int len) {
__shared__ int data[BLOCK_SIZE];
if (threadIdx.x == 0 && blockIdx.x == 0)
{
for (int i = 0; i < BLOCK_SIZE; ++i)
{
printf("DATA[%d] = %d\n", i, data[i]);
data[i] = i;
}
}
}
int main(int argc, char ** argv) {
dim3 block(BLOCK_SIZE, 1, 1);
dim3 grid(10, 1, 1);
scan<<<grid,block>>>(NULL, NULL, NULL);
scan<<<grid,block>>>(NULL, NULL, NULL);
cudaDeviceSynchronize();
getchar();
return 0;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.