简体繁体中英

Coalescesed access across blocks in CUDA?

原文 2013-02-27 20:59:05 2 2 cuda

Let us say we have 16 threads running on block 1 and another 16 threads running on block 2.

Each thread reads 1 double from memory: the 16 threads on block 1 need to read 16 doubles from memory addresses 0-127, and 16 threads on block 2 need to read from addresses 128-255.

I know that the memory reads for the 16 threads on block 1 can be done in one memory transaction because of coalesced accesses.

My question is, when we consider these two blocks, how many memory transactions do we need, one or two? In other words, can memory accesses by different blocks happen at the same time?

2 answers

Blocks are entirely independent - hardware may choose (and likely - will) to launch them on different multiprocessor.

Threads from different blocks will be ran in different warps. Hence it is impossible to coalesce memory accesses between them.

You need at least two memory transactions. For sure threads of each block will be handled in different warps.

Furthermore even if threads have formed one warp or have occupied the same multiprocessor and shared L1 cache, addresses from a warp are converted into lines of 128B or 32B (depends on caching/non-caching mode) therefore in a case of caching mode you would need at least 2 transactions and in a case of non-caching mode 8 transactions. Look at this very useful presentation for better understanding of global memory access.

CUDA sum across blocks

CUDA atomicAdd across blocks

Maximum number of CUDA blocks?

Does this Cuda scan kernel only work within a single block, or across multiple blocks?

Running zero blocks in cuda

CUDA and thread blocks overhead

CUDA - Blocks and Threads

Basic-Blocks in CUDA

CUDA threads and blocks explanation

cuda threads and blocks

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question CUDA sum across blocks CUDA atomicAdd across blocks Maximum number of CUDA blocks? Does this Cuda scan kernel only work within a single block, or across multiple blocks? Running zero blocks in cuda CUDA and thread blocks overhead CUDA - Blocks and Threads Basic-Blocks in CUDA CUDA threads and blocks explanation cuda threads and blocks

Related Tags

Coalescesed access across blocks in CUDA?

Question

2 answers

solution1
2 ACCPTED 2013-02-27 21:34:57

solution2
1 2013-02-27 23:26:47

Coalescesed access across blocks in CUDA?

Question

2 answers

solution1 2 ACCPTED 2013-02-27 21:34:57

solution2 1 2013-02-27 23:26:47

solution1
2 ACCPTED 2013-02-27 21:34:57

solution2
1 2013-02-27 23:26:47