2D cufft变换

Question

My first question on stackoverflow. 我对stackoverflow的第一个问题。

I'm new to cuda. 我是cuda的新手。
I simply want to perform a 2D complex-to-complex FFT. 我只想执行2D复变FFT。
My input data is treated and no padding is needed. 我的输入数据已处理，无需填充。
I just cann't get the expected result. 我只是无法获得预期的结果。 Here's my code: 这是我的代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

#include <cuda_runtime.h>
#include <cufft.h>

typedef float2 Complex;

#define M2 512          // number of rows
#define N2 2048         // number of columns

int main()
{
    int     i, j;
    FILE    *fp;
    char    *fmt = "%16e";

    // Allocate memory for h_input and h_output on host
    // And make sure they are continuous

    Complex     **h_input, **h_output;

    h_input = (Complex **)malloc(M2*sizeof(Complex *));
    h_output= (Complex **)malloc(M2*sizeof(Complex *));

    h_input[0] = (Complex *)malloc(M2*N2*sizeof(Complex));
    h_output[0]= (Complex *)malloc(M2*N2*sizeof(Complex));

    for (i = 1; i < M2; i++){
        h_input[i] = h_input[i - 1] + N2;
        h_output[i]= h_output[i - 1] + N2;
    }

    // Load h_input from a file 
    if ((fp = fopen("INFLU_ORIGIN.DAT", "rt")) == NULL){
        printf("\nCannot open file strike any key exit!");
    }

    for (i = 0; i <= M2 - 1; i++){
        for (j = 0; j <= N2 - 1; j++){
            fscanf(fp, fmt, &h_input[i][j].x);
            h_input[i][j].y = 0.0;
        }
        fscanf(fp, "%\n");
    }

    fclose(fp);


    // allocate memory on device and copy h_input into d_array
    Complex     *d_array;
    size_t      host_orig_pitch = N2 * sizeof(Complex);
    size_t      pitch;

    cudaMallocPitch(&d_array, &pitch, N2 * sizeof(Complex), M2);

    cudaMemcpy2D(d_array, pitch, h_input[0], host_orig_pitch, 
        N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);


    // Copy d_array back to host, and write it to a file
    // to check if they are as correctly copied into device

    cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch, 
        N2* sizeof(Complex), M2, cudaMemcpyDeviceToHost);

    if ((fp = fopen("INFLU_FFT_GET.DAT", "wt")) == NULL){
        printf("\nCannot create file strike any key exit!");
    }

    for (i = 0; i <= M2 - 1; i++){
        for (j = 0; j <= N2 - 1; j++){
            fprintf(fp, fmt, h_output[i][j].x);
        }
        fprintf(fp, "%\n");
    }

    fclose(fp);


    // create CUFFT plan
    cufftHandle plan;
    cufftResult filter_result;

    filter_result = cufftPlan2d(&plan, M2, N2, CUFFT_C2C);

    if (filter_result != CUFFT_SUCCESS){
        printf("\n failed to create plan \n");
    }
    else{
        printf("\n succeeded in creating plan \n");
    }

    // perform forward FFT on d_array
    printf("\nTransforming influence coefficient cufftExecC2C\n");
    filter_result = cufftExecC2C(plan, (cufftComplex *)d_array, 
        (cufftComplex *)d_array, CUFFT_FORWARD);

    if (filter_result != CUFFT_SUCCESS){
        printf("\ntransform failed\n");
    }
    else{
        printf("\ntranform succeed\n");
    }

    // Copy the fft result to host, write it to a file
    // to observe the result of FFT
    cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch, 
        N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);

    if ((fp = fopen("INFLU_FFT_C.DAT", "wt")) == NULL){
        printf("\nCannot create file strike any key exit!");
    }

    for (i = 0; i <= M2-1; i++){
        for (j = 0; j <= N2-1; j++){
            fprintf(fp, fmt, h_output[i][j].x);
        }
        fprintf(fp, "%\n");
    }

    fclose(fp);

    cufftDestroy(plan);

    free(h_input[0]);
    free(h_input);
    free(h_output[0]);
    free(h_output);
    cudaFree(d_array);

    cudaDeviceReset();

}

The workflow of this code is as follows: 该代码的工作流程如下：

(1) Allocate h_input and h_output on host （1）在主机上分配h_input和h_output
(2) Load data into h_input from a file -- "INFLU.DAT" （2）将数据从文件“ INFLU.DAT”加载到h_input中
(3) Allocate d_array on device, and copy h_input into it （3）在设备上分配d_array，并将h_input复制到其中
(4) Copy d_array back to h_output, write to file -- "INFLU_GET.DAT" （4）将d_array复制回h_output，写入文件-“ INFLU_GET.DAT”
---- to see if d_array has received the correct data ----查看d_array是否已收到正确的数据
(5) Perform a forward complex-to-complex FFT on d_array （5）对d_array执行前向复数到复数FFT
(6) Copy d_array back to h_output, write to file -- "INFLU_FFT.DAT" （6）将d_array复制回h_output，写入文件-“ INFLU_FFT.DAT”
---- to observe the result of FFT ----观察FFT的结果

By doing step (4) , I'm sure the copy of h_input into d_array is correct. 通过执行步骤（4），我确定h_input到d_array的副本是正确的。

My problem is: 我的问题是：
In step (6), I found that after the FFT, d_array and h_output are still the same as the input. 在步骤（6）中，我发现在FFT之后，d_array和h_output仍与输入相同。

The input file is: 输入文件为：
https://drive.google.com/file/d/0B88U83cfBwMmdGFtbGJ2MVlURDg/view?usp=sharing https://drive.google.com/file/d/0B88U83cfBwMmdGFtbGJ2MVlURDg/view?usp=sharing
the filename is INFLU.DAT, size is 16MB. 文件名是INFLU.DAT，大小为16MB。

I have a result file for comparison (did in Fortran): 我有一个比较的结果文件（在Fortran中）：
https://drive.google.com/file/d/0B88U83cfBwMmcDR1YzYyRzF4Mjg/view?usp=sharing https://drive.google.com/file/d/0B88U83cfBwMmcDR1YzYyRzF4Mjg/view?usp=sharing
the filename is INFLU_FFT_F.DAT, size is also 16MB. 文件名是INFLU_FFT_F.DAT，大小也为16MB。

Any suggestion is welcome! 任何建议都欢迎！ Thanks! 谢谢！

Answer 1

The problem may come from the last cudaMemcpy() : 问题可能来自最后一个cudaMemcpy() ：

cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch, 
    N2* sizeof(Complex), M2, cudaMemcpyHostToDevice);

It will copy data from host to device, and my guess is that you are trying to copy from device to host, just like you did a few lines above : 它将数据从主机复制到设备，我猜您正在尝试从设备复制到主机，就像您在上面的几行中所做的那样：

cudaMemcpy2D(h_output[0], host_orig_pitch, d_array, pitch, 
    N2* sizeof(Complex), M2, cudaMemcpyDeviceToHost);

2D cufft变换

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-07-08 16:31:23

2D cufft变换

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-07-08 16:31:23

解决方案1
0 已采纳 2015-07-08 16:31:23