简体   繁体   中英

CUDA sum across blocks

Hello I am new to cuda programming and I got a problem.

I have a variable, let's call foo stored in the shared memory of each block with different value from one block to another. And I want only one thread to sum all of them across blocks. I thought to send foo to global memory then compute the sum, but is there any function which can do this more quickly?

Thanks for your help.

It would be faster to have one thread in each block perform an atomicAdd() operation, adding the per-block-value to a single, grid-wide variable in global memory.

See the relevant section of the CUDA C Programming guide .

For a deeper exploration of optimizing reductions (= summation), albeit not necessarily the one you want to perform, have a look at Mark Harris' presentation: Optimizing Parallel Reduction in CUDA .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM