简体   繁体   English

CUDA-每次都必须分配和释放内存吗?

[英]CUDA - do I have to allocate and free memory each time?

I have a convolution kernel with CUDA which is called very often (it is used for a real time rendering). 我有一个使用CUDA的卷积内核,该卷积内核经常被调用(用于实时渲染)。 Should I cudaMalloc and cudaFree each time I want to call the kernel? 每次我想调用内核时都应该使用cudaMalloc和cudaFree吗? I tried to store a pointer to the cudaMalloc result and proceed by just cudaMemcpy'ing things before the kernel execution but I experienced weird behavior (like empty memory after the kernel execution) 我试图存储一个指向cudaMalloc结果的指针,并仅在内核执行之前执行cudaMemcpy的操作,但是我遇到了奇怪的行为(例如内核执行之后的空内存)

I was also thinking about using pinned memory but if I have to allocate and free it every time it could even slow the application down. 我也在考虑使用固定内存,但是如果每次都必须分配并释放它,甚至可能会使应用程序变慢。 How should I proceed for a kernel which gets called very often? 对于经常被调用的内核,我应该如何进行?

No, there is no reason to malloc/free for each kernel call. 不,没有理由为每个内核调用malloc / free。 Malloc'ed memory remains valid until you free it. Malloc的内存在释放之前一直保持有效。 We have lots of code that performs many kernels on allocated memory with and without cudaMemcpy to change the contents in between. 我们有很多代码在有和没有cudaMemcpy的情况下在分配的内存上执行许多内核,以在这两者之间更改内容。

Your problem must be elsewhere. 您的问题必须在其他地方。 Try to boil it down to the smallest possible example that shows the problem and post the code. 尝试将其简化为显示问题的最小示例,然后发布代码。

It sounds like what you're doing should work. 听起来您的工作应该可行。

Maybe you have a bug in your kernel. 也许您的内核中有错误。 Try adding cudaThreadSynchronize and cudaGetLastError calls after the kernel launches to debug. 内核启动进行调试后,尝试添加cudaThreadSynchronize和cudaGetLastError调用。

Without more information, I can't offer you any more advice than that. 没有更多信息,我无法为您提供更多建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM