简体   繁体   English

处理CUDA中指向Image的指针

[英]Handling Image pointer to pointer in CUDA

I have been trying to put in place a CUDA code (taken in part from Fractal Image Compression by Yuval Fisher) that has a double pointer to a 2D image. 我一直在尝试放置CUDA代码(部分内容来自Yuval Fisher的Fractal Image Compression),它具有指向2D图像的双指针。 After taking care of the pointer to pointer allocation in this , I am still getting segmentation fault error along with "Warning: Cannot tell what pointer points to, assuming global memory space" warning. 在照顾指针的指针来分配后 ,我仍然得到分段错误伴随着“警告:不能告诉指针指向,假设全球内存空间”的警告。 Here is the entire code . 这是完整的代码 I am also posting it here as under: (My apologies for duplicating the posted code) 我也将其发布在此处,如下所示:(我很抱歉复制发布的代码)

#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>
#define hsize 256
#define vsize 256

#define IMAGE_TYPE unsigned char


__global__ void kernel(IMAGE_TYPE matrixin[][hsize], IMAGE_TYPE matrixout[][hsize]) {
int tid=threadIdx.x;
int bid=blockIdx.x;

matrixout[bid][tid]=matrixin[bid][tid];
}

int fatal(char* s) {
fprintf(stderr,"%s\n",s);
return 1;
}

#define matrix_allocate(matrix,hsize,vsize,TYPE) {\
    TYPE *imptr;\
    int _i;\
    matrix=(TYPE**)malloc((vsize)*sizeof(TYPE*));\
    imptr=(TYPE*)malloc((long)(hsize)*(long)(vsize)*sizeof(TYPE));\
    if(imptr==NULL)\
    fatal("\nNo memory in matrix allocate.");\
    for(_i=0;_i<vsize;++_i,imptr+=hsize)\
    matrix[_i] = imptr;\
}\


int main() {
typedef IMAGE_TYPE IMarray[vsize][hsize];
IMAGE_TYPE **hin_image,**hout_image;

IMarray *din_image,*dout_image;


//allocate host memory
matrix_allocate(hin_image,hsize,vsize,IMAGE_TYPE)
for(int i=0;i<vsize;i++)
    for(int j=0;j<hsize;j++)
        hin_image[i][j]='a';

matrix_allocate(hout_image,hsize,vsize,IMAGE_TYPE)


//allocate device memory

cudaMalloc((void**)&din_image,(vsize*hsize)*sizeof(IMAGE_TYPE));
cudaMalloc((void**)&dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE));

cudaMemcpy(din_image,hin_image, (vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyHostToDevice);

dim3 threads(hsize,1,1);
dim3 blocks(vsize,1,1);

kernel<<<blocks,threads>>>(din_image,dout_image);

cudaMemcpy(hout_image,dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyDeviceToHost);

for(int i=0;i<10;i++) {
    printf("\n");
    for(int j=0;j<10;j++)
        printf("%c\t",hout_image[i][j]);
}
printf("\n");

cudaFree(din_image);
cudaFree(dout_image);

free(hin_image);
free(hout_image);

return 0;
}

I intend to know what is wrong with the standard 2D access of image inside the kernel function. 我打算知道内核函数内部对图像的标准2D访问有什么问题。 Any help would be highly welcome. 任何帮助将是非常欢迎的。

I'm not going to try and sort out your complex matrix allocation scheme. 我不会尝试整理您的复杂矩阵分配方案。 The purpose of my suggestion was so that you can simplify things to simple 1-line allocations. 我建议的目的是使您可以将事情简化为简单的1行分配。

Furthermore, I don't think you really grasped the example I gave. 此外,我认为您并没有真正理解我所举的例子。 It was a 3D example, and the typedefs had 2 subscripts. 这是一个3D示例,typedef有2个下标。 A 2D version would have typedefs with a single subscript. 2D版本将具有带单个下标的typedef。

Really none of this has to do with CUDA. 确实,这与CUDA无关。 It revolves around understanding of C arrays and pointers. 它围绕对C数组和指针的理解而展开。

Those were the major changes I made to get your code working: 这些是我为使您的代码正常运行而进行的主要更改:

#include <stdio.h>
#include <stdlib.h>
#define hsize 256
#define vsize 256

#define IMAGE_TYPE unsigned char


__global__ void kernel(IMAGE_TYPE matrixin[][hsize], IMAGE_TYPE matrixout[][hsize]) {
  int tid=threadIdx.x;
  int bid=blockIdx.x;

  matrixout[bid][tid]=matrixin[bid][tid];
}

int fatal(char* s) {
  fprintf(stderr,"%s\n",s);
  return 1;
}


int main() {
  typedef IMAGE_TYPE IMarray[hsize];
  IMarray *hin_image,*hout_image;

  IMarray *din_image,*dout_image;


//allocate host memory
  hin_image = (IMarray *)malloc(hsize*vsize*sizeof(IMAGE_TYPE));
  hout_image = (IMarray *)malloc(hsize*vsize*sizeof(IMAGE_TYPE));

  for(int i=0;i<vsize;i++)
    for(int j=0;j<hsize;j++)
        hin_image[i][j]='a';


//allocate device memory

  cudaMalloc((void**)&din_image,(vsize*hsize)*sizeof(IMAGE_TYPE));
  cudaMalloc((void**)&dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE));
  cudaMemset(dout_image, 0, (vsize*hsize)*sizeof(IMAGE_TYPE));
  cudaMemcpy(din_image,hin_image, (vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyHostToDevice);

  dim3 threads(hsize,1,1);
  dim3 blocks(vsize,1,1);

  kernel<<<blocks,threads>>>(din_image,dout_image);

  cudaMemcpy(hout_image,dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyDeviceToHost);

  for(int i=0;i<10;i++) {
    printf("\n");
    for(int j=0;j<10;j++)
        printf("%c\t",hout_image[i][j]);
  }
  printf("\n");

  cudaFree(din_image);
  cudaFree(dout_image);

  free(hin_image);
  free(hout_image);

  return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM