[英]How to get the real and imaginary parts of a complex matrix separately in CUDA?
I'm trying to get the fft of a 2D array. 我正试图获得2D阵列的fft。 The input is a
NxM
real matrix, therefore the output matrix is also a NxM
matrix ( 2xNxM
output matrix which is complex is saved in a NxM matrix using the property Hermitian symmetry). 输入是
NxM
实矩阵,因此输出矩阵也是NxM
矩阵( 2xNxM
输出矩阵使用厄米特对称性保存在NxM矩阵中)。
So i want to know whether there is method to extract in cuda to extract real and complex matrices separately ? 所以我想知道是否有方法在cuda中提取以分别提取实数和复数矩阵? In opencv split function does the duty.
在opencv中分割功能是有责任的。 So I'm looking for a similar function in cuda, but I couldn't find it yet.
所以我在寻找cuda中的类似功能,但我还没找到它。
Given below is my complete code 以下是我的完整代码
#define NRANK 2
#define BATCH 10
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cufft.h>
#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
const size_t NX = 4;
const size_t NY = 5;
// Input array - host side
float b[NX][NY] ={
{0.7943 , 0.6020 , 0.7482 , 0.9133 , 0.9961},
{0.3112 , 0.2630 , 0.4505 , 0.1524 , 0.0782},
{0.5285 , 0.6541 , 0.0838 , 0.8258 , 0.4427},
{0.1656 , 0.6892 , 0.2290 , 0.5383 , 0.1067}
};
// Output array - host side
float c[NX][NY] = { 0 };
cufftHandle plan;
cufftComplex *data; // Holds both the input and the output - device side
int n[NRANK] = {NX, NY};
// Allocated memory and copy from host to device
cudaMalloc((void**)&data, sizeof(cufftComplex)*NX*(NY/2+1));
for(int i=0; i<NX; ++i){
// Uses this because my actual array is a dynamically allocated.
// but here I've replaced it with a static 2D array to make it simple.
cudaMemcpy(reinterpret_cast<float*>(data) + i*NY, b[i], sizeof(float)*NY, cudaMemcpyHostToDevice);
}
// Performe the fft
cufftPlanMany(&plan, NRANK, n,NULL, 1, 0,NULL, 1, 0,CUFFT_R2C,BATCH);
cufftSetCompatibilityMode(plan, CUFFT_COMPATIBILITY_NATIVE);
cufftExecR2C(plan, (cufftReal*)data, data);
cudaThreadSynchronize();
cudaMemcpy(c, data, sizeof(float)*NX*NY, cudaMemcpyDeviceToHost);
// Here c is a NxM matrix. I want to split it to 2 seperate NxM matrices with each
// having the complex and real component of the output
// Here c is in
cufftDestroy(plan);
cudaFree(data);
return 0;
}
As suggested by JackOLanter, I modified the code as below. 正如JackOLanter所建议的,我修改了如下代码。 But still the problem is not solved.
但问题仍然没有解决。
float real_vec[NX][NY] = {0}; // host vector, real part
float imag_vec[NX][NY] = {0}; // host vector, imaginary part
cudaError cudaStat1 = cudaMemcpy2D (real_vec, sizeof(real_vec[0]), data, sizeof(data[0]),NY*sizeof(float2), NX, cudaMemcpyDeviceToHost);
cudaError cudaStat2 = cudaMemcpy2D (imag_vec, sizeof(imag_vec[0]),data + 1, sizeof(data[0]),NY*sizeof(float2), NX, cudaMemcpyDeviceToHost);
The error i get is 'invalid pitch argument error'. 我得到的错误是'无效音高参数错误'。 But i can't understand why.
但我无法理解为什么。 For the destination I use a pitch size of 'float' while for the source i use size of 'float2'
对于目的地,我使用间距大小为'float',而对于源我使用'float2'的大小
Your question and your code do not make much sense to me. 你的问题和你的代码对我来说没有多大意义。
cufftExecR2C
is a NX*(NY/2+1)
float2
matrix, which can be interpreted as a NX*(NY+2)
float
matrix. cufftExecR2C
的输出是NX*(NY/2+1)
float2
矩阵,可以解释为NX*(NY+2)
float
矩阵。 Accordingly, you are not allocating enough space for c
(which is only NX*NY
float
) for the last cudaMemcpy
. cudaMemcpy
为c
(仅为NX*NY
float
)分配足够的空间。 You would need still one complex memory location for the continuous component of the output; cufftExecR2C
command, but is much more general: how can I split a complex NX*NY
matrix into 2
NX*NY
real matrices containing the real and imaginary parts, respectively. cufftExecR2C
命令cufftExecR2C
,但更为通用:如何将复杂的NX*NY
矩阵分别分为包含实部和虚部的2
NX*NY
实矩阵。 If I correctly interpret your question, then the solution proposed by @njuffa at 如果我正确地解释了你的问题,那么@njuffa提出的解决方案就是
Copying data to “cufftComplex” data struct? 将数据复制到“cufftComplex”数据结构?
could be a good clue to you. 可能是你的一个很好的线索。
EDIT 编辑
In the following, a small example on how "assembling" and "disassembling" the real and imaginary parts of complex vectors when copying them from/to host to/from device. 下面是一个小例子,说明当复制矢量从/向主机复制到设备时,如何“组装”和“拆解”复数矢量的实部和虚部。 Please, add your own CUDA error checking .
请添加您自己的CUDA错误检查 。
#include <stdio.h>
#define N 16
int main() {
// Declaring, allocating and initializing a complex host vector
float2* b = (float2*)malloc(N*sizeof(float2));
printf("ORIGINAL DATA\n");
for (int i=0; i<N; i++) {
b[i].x = (float)i;
b[i].y = 2.f*(float)i;
printf("%f %f\n",b[i].x,b[i].y);
}
printf("\n\n");
// Declaring and allocating a complex device vector
float2 *data; cudaMalloc((void**)&data, sizeof(float2)*N);
// Copying the complex host vector to device
cudaMemcpy(data, b, N*sizeof(float2), cudaMemcpyHostToDevice);
// Declaring and allocating space on the host for the real and imaginary parts of the complex vector
float* cr = (float*)malloc(N*sizeof(float));
float* ci = (float*)malloc(N*sizeof(float));
/*******************************************************************/
/* DISASSEMBLING THE COMPLEX DATA WHEN COPYING FROM DEVICE TO HOST */
/*******************************************************************/
float* tmp_d = (float*)data;
cudaMemcpy2D(cr, sizeof(float), tmp_d, 2*sizeof(float), sizeof(float), N, cudaMemcpyDeviceToHost);
cudaMemcpy2D(ci, sizeof(float), tmp_d+1, 2*sizeof(float), sizeof(float), N, cudaMemcpyDeviceToHost);
printf("DISASSEMBLED REAL AND IMAGINARY PARTS\n");
for (int i=0; i<N; i++)
printf("cr[%i] = %f; ci[%i] = %f\n",i,cr[i],i,ci[i]);
printf("\n\n");
/******************************************************************************/
/* REASSEMBLING THE REAL AND IMAGINARY PARTS WHEN COPYING FROM HOST TO DEVICE */
/******************************************************************************/
cudaMemcpy2D(tmp_d, 2*sizeof(float), cr, sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);
cudaMemcpy2D(tmp_d + 1, 2*sizeof(float), ci, sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);
// Copying the complex device vector to host
cudaMemcpy(b, data, N*sizeof(float2), cudaMemcpyHostToDevice);
printf("REASSEMBLED DATA\n");
for (int i=0; i<N; i++)
printf("%f %f\n",b[i].x,b[i].y);
printf("\n\n");
getchar();
return 0;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.