CUDA-动态共享内存触发推力::: system :: system_error

Question

I just started to learn CUDA programming via Udacity. 我刚刚开始通过Udacity学习CUDA编程。 I got the following error even when trying to use dynamic shared memeory. 即使尝试使用动态共享内存，我也会遇到以下错误。

CUDA error at: main.cpp:55
invalid argument cudaGetLastError()
terminate called after throwing an instance of thrust::system::system_error'
what():  unload of CUDA runtime failed

We are unable to execute your code. Did you set the grid and/or block size correctly?

I searched quite a lot but still had no clue where goes wrong here. 我搜索了很多内容，但仍然不知道这里出了什么问题。 Interestingly if I change the last two lines to 有趣的是，如果我将最后两行更改为

    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*1000>>>(d_inputVals, d_inputPos, d_outputVals, d_outputPos, numElems, 0);   
    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*1000>>>(d_inputVals, d_inputPos, &d_outputVals[numElems/2], &d_outputPos[numElems/2], numElems, 1);

, no error was thrown when running the code. ，运行代码时不会引发任何错误。 However, it does not make sense since the space for dynamic memory allocation should not be limited to constant. 但是，这没有意义，因为动态内存分配的空间不应限制为常数。 Maybe it is not my code but the settings on Udacity? 也许不是我的代码，而是Udacity上的设置？ The code I wrote is below. 我写的代码如下。 Any help would be greatly appreciated. 任何帮助将不胜感激。

__global__ void compact_kernel(unsigned int* const d_inputVals,
    unsigned int* const d_inputPos,
    unsigned int* const d_outputVals,
    unsigned int* const d_outputPos,
    const size_t numElems,
    const size_t refBit)
{
    const size_t tid = blockIdx.x * blockDim.x + threadIdx.x;

    // predicate
    const bool predicate = (d_inputVals[tid] & 1) == refBit;
    extern __shared__ int s[];   
}

void your_sort(unsigned int* const d_inputVals,
    unsigned int* const d_inputPos,
    unsigned int* const d_outputVals,
    unsigned int* const d_outputPos,
    const size_t numElems)
{ 
    const size_t numBlocks = numElems/512;
    const size_t numThreadsPerBlock = 256;
    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*numElems>>>(d_inputVals, d_inputPos, d_outputVals, d_outputPos, numElems, 0);   
    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*numElems>>>(d_inputVals, d_inputPos, &d_outputVals[numElems/2], &d_outputPos[numElems/2], numElems, 1);

}` }`

EDIT: The value of numElems is 220480. Is this number too big for dynamic memory allocation? 编辑： numElems的值为220480。对于动态内存分配，此数字是否太大？

Answer 1

根据编程指南，对于所有当前CUDA设备， 每个线程块的共享内存限制为48 KB 。

CUDA-动态共享内存触发推力::: system :: system_error

问题描述

1 个解决方案

解决方案1
2 已采纳

CUDA-动态共享内存触发推力::: system :: system_error

问题描述

1 个解决方案

解决方案1 2 已采纳

解决方案1
2 已采纳