簡體   English   中英

cuda fortran 袖口PlanMany

[英]cuda fortran cufftPlanMany

我在使用 cufftPlanMany 時遇到問題。 創建計划並進行正向和反向 FFT 后,我無法取回原始數據。 請在附件中找到代碼的最低版本。

program test_cufft
  use cudafor
  use cufft

  integer :: plan_r2c
  integer :: plan_c2r
  real,allocatable,dimension(:,:,:,:), device :: eta_d
  complex,allocatable,dimension(:,:,:,:), device :: etak_d

  nv = 4
  nx = 256
  ny = 512
  nz = 512
  nx21 = nx/2+1

  allocate( eta_d(nv,nx,ny,nz) )
  allocate( etak_d(nv,nx21,ny,nz) )

  batch = nv;
  rank = 3;
  n = (/ nx, ny, nz /);
  idist = nx*ny*nz;
  odist = nx21*ny*nz;
  inembed = (/ nx, ny, nz /);
  onembed = (/ nx21, ny, nz /);
  istride = 1;
  ostride = 1;

  istat = cufftPlanMany( plan_r2c, rank, n, inembed, istride, idist, &
                         onembed, ostride, odist, CUFFT_R2C, batch )
  istat = cufftPlanMany( plan_c2r, rank, n, onembed, ostride, odist, &
                         inembed, istride, idist, CUFFT_C2R, batch )

  ! Initialize eta_d

  istat = cufftExecR2C( plan_r2c, eta_d, etak_d )
  istat = cufftExecC2R( plan_c2r, etak_d, eta_d )
  eta_d = eta_d/idist

end program test_cufft

問題是在我進行了正向和反向 FFT 之后,我無法取回原始數據。 請問,我做錯了什么? 數據的順序應該是eta_d(batch,nx,ny,nz) or eta_d(nx,ny,nz,batch)

我會說正確的順序是(nz, ny, nx, batch)

但是將這些與您的數組索引和存儲順序相關聯也很重要。

在 CUFFT 術語中,對於 3D 變換(*), nz方向是變化最快的索引,典型用法(步幅 = 1)是 memory 中的相鄰數據,對應於變換中的相鄰元素。

對於 R2C/C2R 變換類型,這個方向(我認為它是沿行的元素,即“z”索引是列索引)也是在復域中“減少”的多維變換的方向.

考慮到這一點,我會以這種方式重寫您的代碼:

$ cat t4.cuf
program test_cufft
  use cudafor
  use cufft

  integer :: plan_r2c
  integer :: plan_c2r
  real,allocatable,dimension(:,:,:,:), managed :: eta_d
  complex,allocatable,dimension(:,:,:,:), managed :: etak_d
  integer :: n(3), inembed(3), onembed(3),rank,istride,idist,ostride,odist,batch

  nv = 4
  nx = 8
  ny = 8
  nz = 4
  nz21 = nz/2+1

  allocate( eta_d(nz,ny,nx,nv) )
  allocate( etak_d(nz21,ny,nx,nv) )

  batch = nv;
  rank = 3;
  n = (/ nx, ny, nz /);
  idist = nx*ny*nz;
  odist = nx*ny*nz21;
  inembed = (/ nx, ny, nz /);
  onembed = (/ nx, ny, nz21 /);
  istride = 1;
  ostride = 1;

  istat = cufftPlanMany( plan_r2c, rank, n, inembed, istride, idist, onembed, ostride, odist, CUFFT_R2C, batch )
  istat = cufftPlanMany( plan_c2r, rank, n, onembed, ostride, odist, inembed, istride, idist, CUFFT_C2R, batch )

  ! Initialize eta_d
  eta_d(:,:,:,:) = 1.0
  eta_d(1,1,1,2) = 2.0
  istat = cufftExecR2C( plan_r2c, eta_d, etak_d )
  istat = cudaDeviceSynchronize()
  eta_d(:,:,:,:) = 0.0
  istat = cufftExecC2R( plan_c2r, etak_d, eta_d )
  istat = cudaDeviceSynchronize()
  eta_d = eta_d/idist
  print *,eta_d(1,1,1,1)
  print *,eta_d(1,1,1,2)
end program test_cufft
$ nvfortran t4.cuf -lcufft
$ ./a.out
    1.000000
    2.000000
$

(NVIDIA HPC SDK 20.9,Tesla V100 GPU)

它似乎為我的簡單測試用例提供了預期的結果。

(*) 對於 2D 變換, ny維度變化最快,而對於 1D 變換, nx維度(當然)變化最快。

CUFFT 手冊的多維變換和高級數據布局部分也可能是有用的閱讀材料。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM