简体繁体中英

Amount of bank were doubled, but warp still 32 since sm2.X

原文 2013-12-22 13:42:45 8 1 cuda

Refered to wiki/CUDA , since sm2.X number of shared memory banks were doubled, but the warp size is still 32. As I read before, there is banks conflicts only per half-warp, not full-warp, so there is no need to have 32 banks per 16 (half-warp) thread, so why it was doubled? Is it means, since sm2.X CUDA began to work with full not half -warp and there is no need in conception of half-warp now?

1 answers

The organization of shared memory and bank conflicts varies depending on the compute capability.

Please familiarize yourself with the CUDA documentation .

In particular, the programming guide .

In particular, the sections at the end of the programming guide discuss the architectural characteristics as they vary by compute capability, include shared memory and warp execution characteristics.

The warp size is always 32 across all current GPUs. Detailed warp execution characteristics vary between cc1.x and cc2.0 and newer devices.

there is banks conflicts only per half-warp,

This concept applies to cc 1.x devices only, where shared memory has 16 banks . Bank conflicts can be considered across an entire warp in cc2.0 and newer devices, where shared memory has 32 banks .

why it was doubled?

Since the execution characteristics on cc2.0 and newer devices are such that a full warp can be executed in lockstep, the shared memory was adjusted in terms of banks and access bandwidth so that a full warp of accesses could be serviced if they are not bank-conflicted.

When a warp is assigned to a portion of a SM, will it stay there until that warp completes?

Why there are two warp schedulers in a SM of GPU?

Will 32 threads from 32 block be scheduled as a warp?

Bank conflicts in 2.x devices

Trying to understand nvprof metrics, sm_efficiency and warp_execution_efficiency zero

Amount of cores per SM and threads per block in CUDA

what's the difference between a thread in a block and a warp(32 threads)?

What is the real amount of shared memory for block on sm13?

What will happen if the number of threads in a warp are less than 32?

write a cuda program to compile both sm_1x and sm_2x

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question When a warp is assigned to a portion of a SM, will it stay there until that warp completes? Why there are two warp schedulers in a SM of GPU? Will 32 threads from 32 block be scheduled as a warp? Bank conflicts in 2.x devices Trying to understand nvprof metrics, sm_efficiency and warp_execution_efficiency zero Amount of cores per SM and threads per block in CUDA what's the difference between a thread in a block and a warp(32 threads)? What is the real amount of shared memory for block on sm13? What will happen if the number of threads in a warp are less than 32? write a cuda program to compile both sm_1x and sm_2x

Related Tags

Amount of bank were doubled, but warp still 32 since sm2.X

Question

1 answers

solution1 2 ACCPTED 2013-12-22 15:24:47

solution1
2 ACCPTED 2013-12-22 15:24:47