简体   繁体   English

Cufft的内存需求

[英]Memory requirements for cufft

I have four cufftHandles, and I use cufftPlanMany to initialize each of them (together). 我有四个cufftHandles,并且我使用cufftPlanMany来初始化它们(一起)。
I'm using cufftGetSizeMany() to estimate the memory required for each one of them. 我正在使用cufftGetSizeMany()来估计每个对象所需的内存。
Lets say that s0 is the size of the first one, s1 is the size of the second one, and so on. 假设s0是第一个的大小,s1是第二个的大小,依此类推。
I do the fft and ifft using those four plans, then at the end I destroy all of them together. 我使用这四个计划进行fft和ifft,然后最后将它们全部销毁。

My question is, is the actual total memory required for those four plans equals 我的问题是,这四个计划所需的实际总内存是否等于

total_size = s0 + s1 + s2 + s3 , total_size = s0 + s1 + s2 + s3

or 要么

total_size = max(s0, s1, s2, s3)

Please note that I use each one of them at a time, but I plan all of them together at the beginning, and destroy all of them together at the end. 请注意,我一次使用它们中的每一个,但是我一开始就将它们全部计划在一起,最后将它们一起销毁。

The memory required for a plan is only required when that plan is participating in an exec call. 仅当计划参与exec调用时才需要该计划所需的内存。

Note the documentation here : 在此处注意文档

" During plan execution , cuFFT requires a work area for temporary storage of intermediate results..." 在计划执行期间 ,cuFFT需要一个工作区来临时存储中间结果...”

I disagree with the other answer (or at least with the interpretation of the OP in the comment to the answer). 我不同意其他答案(或至少同意该答案的注释中对OP的解释)。

Of course the memory is only required when the plan is executed , however the memory is allocated when the plan is created (in auto allocation mode which is default). 当然,仅在执行计划时才需要内存,但是在创建计划时分配内存(在默认情况下为自动分配模式)。

There are several places in the documentation which indicate this behaviour, eg here 文档中有很多地方表明了这种行为,例如此处

Function cufftDestroy(): Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure. 函数cufftDestroy():释放与cuFFT计划关联的所有GPU资源,并破坏内部计划数据结构。 This function should be called once a plan is no longer needed, to avoid wasting GPU memory. 一旦不再需要计划时,应调用此函数,以避免浪费GPU内存。

I also verified (in the profiler timeline) that there are only memory allocations on plan creation and no allocations on execution. 我还验证了(在探查器时间轴中)在计划创建时只有内存分配,而在执行时没有内存分配。


Solution

If you want to use only max(s0,s1,s2,s3) memory you need to manage the workspace yourself. 如果只想使用max(s0,s1,s2,s3)内存,则需要自己管理工作区。

  • You need to set the allocation mode with cufftSetAutoAllocation(plan, false) before plan creation. 在计划创建之前cufftSetAutoAllocation(plan, false)需要使用cufftSetAutoAllocation(plan, false)设置分配模式。
  • Then, after plan creation, you can get the required memory size with cufftGetSize() for each plan 然后,在创建计划之后,您可以使用cufftGetSize()获取每个计划所需的内存大小
  • and use cufftSetWorkArea() to point all plans to the same memory location with max size. 并使用cufftSetWorkArea()将所有计划指向最大大小的相同内存位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM