简体   繁体   中英

coalesced reads/writes in CUDA

Is there a way to check my kernel reads and writes in a coalesced way from/to global memory? I've been trying ways to make sure my kernel reads and writes to memory efficiently to get a better performance.

Thanks

Use a profiler such as nvprof

The gld_efficiency and gst_efficiency metrics will give you a direct measure of percentage of coalesced global loads and stores. For example on Linux:

nvprof --metrics gld_efficiency,gst_efficiency ./my_app

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM