简体   繁体   中英

declaring shared memory variables in a kernel

I have a question about how shared variables work.

When I declare a shared variable in a kernel like this __shared__ int array1[N] every unique shared memory of each active block now has an instance of array1 with size N. Meaning that every shared memory of each active block has now allocated N*sizeof(int) bytes. And N*sizeof(int) must be at most 16KB for a gpu with compute capability 1.3.

So, assuming the above is correct and using 2D threads and 2D blocks assigned at host like this:

dim3 block_size(22,22);
dim3 grid_size(25,25);

I would have 25x25 instances of array1 with size N*sizeof(int) each and the most threads that could access each shared memory of a block is 22x22. This was my original question and it was answered.

Q: When I assign a value to array1

array1[0]=1;

then do all active blocks assign that value instantly at their own shared memory?

Each block will always allocate its own shared memory array. So, if you launch 25x25 blocks, you will ultimately create 25x25 arrays in shared memory.

It does not mean, however, that all those arrays will exist at the same time, because it is not guaranteed that all blocks exist at the same time. Number of active blocks depends on the actual model of the GPU it is being run on. The GPU driver will try to launch as many as possible and the extra blocsk will run after previous ones end their work.

The maximum of N*sizeof(int) depends on Compute Capaiblity of your card and the L1-cache configuration. It can vary between: 8KB, 16KB, 32KB and 48KB.

To answer your last question - each shared array is visible by all threads belonging to the corresponding block. In your case each shared array will be visible by the corresponding 22x22 threads.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM