简体   繁体   中英

Amount of bank were doubled, but warp still 32 since sm2.X

Refered to wiki/CUDA , since sm2.X number of shared memory banks were doubled, but the warp size is still 32. As I read before, there is banks conflicts only per half-warp, not full-warp, so there is no need to have 32 banks per 16 (half-warp) thread, so why it was doubled? Is it means, since sm2.X CUDA began to work with full not half -warp and there is no need in conception of half-warp now?

The organization of shared memory and bank conflicts varies depending on the compute capability.

Please familiarize yourself with the CUDA documentation .

In particular, the programming guide .

In particular, the sections at the end of the programming guide discuss the architectural characteristics as they vary by compute capability, include shared memory and warp execution characteristics.

The warp size is always 32 across all current GPUs. Detailed warp execution characteristics vary between cc1.x and cc2.0 and newer devices.

there is banks conflicts only per half-warp,

This concept applies to cc 1.x devices only, where shared memory has 16 banks . Bank conflicts can be considered across an entire warp in cc2.0 and newer devices, where shared memory has 32 banks .

why it was doubled?

Since the execution characteristics on cc2.0 and newer devices are such that a full warp can be executed in lockstep, the shared memory was adjusted in terms of banks and access bandwidth so that a full warp of accesses could be serviced if they are not bank-conflicted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM