简体   繁体   中英

x64 vs x86 for CUDA

In this thread x64 allows less threads per block than Win32? there was a questions about running out of registers. I was under the impression the Nvidia has dropped support for x86 in CUDA 7.5 and beyond. This may be a foolish question but does that mean that all pointers are going to require two registers going forward? And that potentially less threads/block will be the way things work going forward?

This may be a foolish question but does that mean that all pointers are going to require two registers going forward?

Yes. All pointers in x64 mode will require 2 (32-bit) registers for storage.

And that potentially less threads/block will be the way things work going forward?

Certainly there should be no impact on the number of blocks that can be launched. Regarding threads, yes, there is potentially an impact on threads per block (since the product of threads per block launched times registers per thread must be lower than the machine limit), but as I stated in my answer to the question you linked, the limitation on threads can usually be worked around using one of several methods as mentioned there. Many kernels will not be impacted, because they are not "up against the limit". For those kernels that are "up against the limit", there are well established techniques to mitigate the effect and allow you to run the desired number of threads per block, up to 1024.

Ultimately this means the issue presented is not one of capability so much as it is one of performance optimization , which issue will always be present.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM