使用“cuFFT 设备回调”

Question

This is my first question, so I'll try to be as detailed as possible.这是我的第一个问题，所以我会尽量详细。 I'm working on implementing noise reduction algorithm in CUDA 6.5.我正在努力在 CUDA 6.5 中实现降噪算法。 My code is based on this Matlab implementation: http://pastebin.com/HLVq48C1 .我的代码基于此 Matlab 实现： http : //pastebin.com/HLVq48C1 。
I'd love to use new cuFFT Device Callbacks feature, but I'm stuck on cufftXtSetCallback .我很想使用新的 cuFFT 设备回调功能，但我坚持使用cufftXtSetCallback 。 Every time my cufftResult is CUFFT_NOT_IMPLEMENTED (14).每次我的 cufftResult 都是CUFFT_NOT_IMPLEMENTED (14)。 Even example provided by nVidia fails the same way... My device callback testing code:甚至 nVidia 提供的示例也以同样的方式失败......我的设备回调测试代码：

__device__ void noiseStampCallback(void *dataOut,
                                size_t offset,
                                cufftComplex element,
                                void *callerInfo,
                                void *sharedPointer) {
    element.x = offset;
    element.y = 2;
    ((cufftComplex*)dataOut)[offset] = element;
}
__device__ cufftCallbackStoreC noiseStampCallbackPtr = noiseStampCallback;

CUDA part of my code:我的代码的 CUDA 部分：

cufftHandle forwardFFTPlan;//RtC
//find how many windows there are
int batch = targetFile->getNbrOfNoiseWindows();
size_t worksize;

cufftCreate(&forwardFFTPlan);
cufftMakePlan1d(forwardFFTPlan, WINDOW, CUFFT_R2C, batch, &worksize); //WINDOW = 2048 

//host memory, allocate
float *h_wave;
cufftComplex *h_complex_waveSpec;
unsigned int m_num_real_elems = batch*WINDOW*2;
h_wave = (float*)malloc(m_num_real_elems * sizeof(float));
h_complex_waveSpec = (cufftComplex*)malloc((m_num_real_elems/2+1)*sizeof(cufftComplex));

//init
memset(h_wave, 0, sizeof(float) * m_num_real_elems); //last window won't probably be full of file data, so fill memory with 0
memset(h_complex_waveSpec, 0, sizeof(cufftComplex) * (m_num_real_elems/2+1));
targetFile->getNoiseFile(h_wave); //fill h_wave with samples from sound file

//device memory, allocate, copy from host
float *d_wave;
cufftComplex *d_complex_waveSpec;

cudaMalloc((void**)&d_wave, m_num_real_elems * sizeof(float));
cudaMalloc((void**)&d_complex_waveSpec, (m_num_real_elems/2+1) * sizeof(cufftComplex));

cudaMemcpy(d_wave, h_wave, m_num_real_elems * sizeof(float), cudaMemcpyHostToDevice);

//prepare callback
cufftCallbackStoreC hostNoiseStampCallbackPtr;

cudaMemcpyFromSymbol(&hostNoiseStampCallbackPtr,
                          noiseStampCallbackPtr,
                          sizeof(hostNoiseStampCallbackPtr));

cufftResult status = cufftXtSetCallback(forwardFFTPlan,
                                        (void **)&hostNoiseStampCallbackPtr,
                                        CUFFT_CB_ST_COMPLEX,
                                        NULL);
//always return status 14 - CUFFT_NOT_IMPLEMENTED

//run forward plan
cufftResult result = cufftExecR2C(forwardFFTPlan, d_wave, d_complex_waveSpec);
//result seems to be okay without cufftXtSetCallback

I'm aware that I'm just a beginner in CUDA.我知道我只是 CUDA 的初学者。 My question is:我的问题是：
How can I call cufftXtSetCallback properly or what is a cause of this error?如何正确调用 cufftXtSetCallback 或导致此错误的原因是什么？

Answer 1

Referring to the documentation :参考文档：

The callback API is available in the statically linked cuFFT library only, and only on 64 bit LINUX operating systems.回调 API 仅在静态链接的 cuFFT 库中可用，并且仅在 64 位 LINUX 操作系统上可用。 Use of this API requires a current license.使用此 API 需要当前许可证。 Free evaluation licenses are available for registered developers until 6/30/2015.在 2015 年 6 月 30 日之前，注册开发人员可以获得免费评估许可证。 To learn more please visit the cuFFT developer page .要了解更多信息，请访问cuFFT 开发人员页面。

I think you are getting the not implemented error because either you are not on a Linux 64 bit platform, or you are not explicitly linking against the CUFFT static library.我认为您收到了未实现的错误，因为您不在 Linux 64 位平台上，或者您没有明确链接到 CUFFT 静态库。 The Makefile in the cufft callback sample will give the correct method to link. cufft 回调示例中的 Makefile 将提供正确的链接方法。

Even if you fix that issue, you will likely run into a CUFFT_LICENSE_ERROR unless you have gotten one of the evaluation licenses.即使您解决了该问题，除非您已获得其中一个评估许可证，否则您可能会遇到CUFFT_LICENSE_ERROR 。

Note that there are various device limitations as well for linking to the cufft static library.请注意，链接到袖口静态库也存在各种设备限制。 It should be possible to build a statically linked CUFFT application that will run on cc 2.0 and greater devices.应该可以构建一个静态链接的 CUFFT 应用程序，该应用程序将在 cc 2.0 和更高版本的设备上运行。

Answer 2

A new (2019) possibility are cuFFT device extensions (cuFFTDX).一种新的（2019 年）可能性是 cuFFT 设备扩展 (cuFFTDX)。 Being part of the Math Library Early Access they are device FFT functions, which can be inlined into user kernels.作为 Math Library Early Access 的一部分，它们是设备 FFT 函数，可以内联到用户内核中。

Announcement of cuFFTDX: cuFFTDX公告：

https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9240-cuda-new-features-and-beyond.pdf https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9240-cuda-new-features-and-beyond.pdf

Math Library Early Access:数学图书馆抢先体验：

https://developer.nvidia.com/cuda-math-library-early-access-program-page https://developer.nvidia.com/cuda-math-library-early-access-program-page

Example Code:示例代码：

https://github.com/mnicely/cufft_examples https://github.com/mnicely/cufft_examples

使用“cuFFT 设备回调”

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-09-13 15:37:46

解决方案2
1 2020-09-17 10:08:58

使用“cuFFT 设备回调”

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-09-13 15:37:46

解决方案2 1 2020-09-17 10:08:58

解决方案1
3 已采纳 2014-09-13 15:37:46

解决方案2
1 2020-09-17 10:08:58