I have the following code snippet:
__constant__ int baseLineX[4000];
__constant__ int baseLineY[4000];
__constant__ int guideLineX[4000];
__constant__ int guideLineY[4000];
__constant__ int rectangleOffsets[8];
__constant__ float blurKernel[64];
<other code>
for(int i = 0; i < 8; i++)
hostRectangleOffsets[i] = i;
cudaMemcpyToSymbol(rectangleOffsets, hostRectangleOffsets, 8*sizeof(int));
This code works fine on a Tesla K40 but not on a 16GB Tesla V100. (Even my laptop can run the code with a 4GB Quaddro M2200 GPU).
Code just hangs on the V100 and never returns from the cudaMemcpyToSymbol call but looks like it's still being processed on the GPU. Any ideas?
Well, you haven't provided a Minimal, complete, verifiable example : Your code doesn't compile and is missing statements, yet has (apparently) irrelevant statements. So - nobody can actually check.
I can still make several suggestions though:
cudaMemcpyToSymbolAsync()
. At least your program won't hang...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.