简体   繁体   中英

cudaMemcpyToSymbol just hangs and never returns. GPU processing at 100%. Code works fine on K40 but not on V100

I have the following code snippet:

__constant__ int baseLineX[4000];
__constant__ int baseLineY[4000];
__constant__ int guideLineX[4000];
__constant__ int guideLineY[4000];
__constant__ int rectangleOffsets[8];

__constant__ float blurKernel[64];

<other code>

for(int i = 0; i < 8; i++)
    hostRectangleOffsets[i] = i;

cudaMemcpyToSymbol(rectangleOffsets, hostRectangleOffsets, 8*sizeof(int));

This code works fine on a Tesla K40 but not on a 16GB Tesla V100. (Even my laptop can run the code with a 4GB Quaddro M2200 GPU).

Code just hangs on the V100 and never returns from the cudaMemcpyToSymbol call but looks like it's still being processed on the GPU. Any ideas?

Well, you haven't provided a Minimal, complete, verifiable example : Your code doesn't compile and is missing statements, yet has (apparently) irrelevant statements. So - nobody can actually check.

I can still make several suggestions though:

  1. Try using the asynchronous version of this call: cudaMemcpyToSymbolAsync() . At least your program won't hang...
  2. Run your program or app in a debugger to begin with (eg nVIDIA's nSight on most systems, or their extension to Visual Studio on Windows); alternatively, attach a debugger to the hanging process ( MSVS instructions , Eclipse instructions - old ).
  3. Run the process with core dump enabled (if you're on a Unix'ish system), kill it when it hangs, then open the core dump in a debugger and you'll at least get the back-trace
  4. Try rebuilding your program with less optimizations enabled - this sometimes helps, at least for diagnostical purposes (this can be combined with the previous suggestions).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM