处理CUDA中指向Image的指针

Question

I have been trying to put in place a CUDA code (taken in part from Fractal Image Compression by Yuval Fisher) that has a double pointer to a 2D image. 我一直在尝试放置CUDA代码（部分内容来自Yuval Fisher的Fractal Image Compression），它具有指向2D图像的双指针。 After taking care of the pointer to pointer allocation in this , I am still getting segmentation fault error along with "Warning: Cannot tell what pointer points to, assuming global memory space" warning. 在照顾指针的指针来分配后这，我仍然得到分段错误伴随着“警告：不能告诉指针指向，假设全球内存空间”的警告。 Here is the entire code . 这是完整的代码。 I am also posting it here as under: (My apologies for duplicating the posted code) 我也将其发布在此处，如下所示：（我很抱歉复制发布的代码）

#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>
#define hsize 256
#define vsize 256

#define IMAGE_TYPE unsigned char


__global__ void kernel(IMAGE_TYPE matrixin[][hsize], IMAGE_TYPE matrixout[][hsize]) {
int tid=threadIdx.x;
int bid=blockIdx.x;

matrixout[bid][tid]=matrixin[bid][tid];
}

int fatal(char* s) {
fprintf(stderr,"%s\n",s);
return 1;
}

#define matrix_allocate(matrix,hsize,vsize,TYPE) {\
    TYPE *imptr;\
    int _i;\
    matrix=(TYPE**)malloc((vsize)*sizeof(TYPE*));\
    imptr=(TYPE*)malloc((long)(hsize)*(long)(vsize)*sizeof(TYPE));\
    if(imptr==NULL)\
    fatal("\nNo memory in matrix allocate.");\
    for(_i=0;_i<vsize;++_i,imptr+=hsize)\
    matrix[_i] = imptr;\
}\


int main() {
typedef IMAGE_TYPE IMarray[vsize][hsize];
IMAGE_TYPE **hin_image,**hout_image;

IMarray *din_image,*dout_image;


//allocate host memory
matrix_allocate(hin_image,hsize,vsize,IMAGE_TYPE)
for(int i=0;i<vsize;i++)
    for(int j=0;j<hsize;j++)
        hin_image[i][j]='a';

matrix_allocate(hout_image,hsize,vsize,IMAGE_TYPE)


//allocate device memory

cudaMalloc((void**)&din_image,(vsize*hsize)*sizeof(IMAGE_TYPE));
cudaMalloc((void**)&dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE));

cudaMemcpy(din_image,hin_image, (vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyHostToDevice);

dim3 threads(hsize,1,1);
dim3 blocks(vsize,1,1);

kernel<<<blocks,threads>>>(din_image,dout_image);

cudaMemcpy(hout_image,dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyDeviceToHost);

for(int i=0;i<10;i++) {
    printf("\n");
    for(int j=0;j<10;j++)
        printf("%c\t",hout_image[i][j]);
}
printf("\n");

cudaFree(din_image);
cudaFree(dout_image);

free(hin_image);
free(hout_image);

return 0;
}

I intend to know what is wrong with the standard 2D access of image inside the kernel function. 我打算知道内核函数内部对图像的标准2D访问有什么问题。 Any help would be highly welcome. 任何帮助将是非常欢迎的。

Answer 1

I'm not going to try and sort out your complex matrix allocation scheme. 我不会尝试整理您的复杂矩阵分配方案。 The purpose of my suggestion was so that you can simplify things to simple 1-line allocations. 我建议的目的是使您可以将事情简化为简单的1行分配。

Furthermore, I don't think you really grasped the example I gave. 此外，我认为您并没有真正理解我所举的例子。 It was a 3D example, and the typedefs had 2 subscripts. 这是一个3D示例，typedef有2个下标。 A 2D version would have typedefs with a single subscript. 2D版本将具有带单个下标的typedef。

Really none of this has to do with CUDA. 确实，这与CUDA无关。 It revolves around understanding of C arrays and pointers. 它围绕对C数组和指针的理解而展开。

Those were the major changes I made to get your code working: 这些是我为使您的代码正常运行而进行的主要更改：

#include <stdio.h>
#include <stdlib.h>
#define hsize 256
#define vsize 256

#define IMAGE_TYPE unsigned char


__global__ void kernel(IMAGE_TYPE matrixin[][hsize], IMAGE_TYPE matrixout[][hsize]) {
  int tid=threadIdx.x;
  int bid=blockIdx.x;

  matrixout[bid][tid]=matrixin[bid][tid];
}

int fatal(char* s) {
  fprintf(stderr,"%s\n",s);
  return 1;
}


int main() {
  typedef IMAGE_TYPE IMarray[hsize];
  IMarray *hin_image,*hout_image;

  IMarray *din_image,*dout_image;


//allocate host memory
  hin_image = (IMarray *)malloc(hsize*vsize*sizeof(IMAGE_TYPE));
  hout_image = (IMarray *)malloc(hsize*vsize*sizeof(IMAGE_TYPE));

  for(int i=0;i<vsize;i++)
    for(int j=0;j<hsize;j++)
        hin_image[i][j]='a';


//allocate device memory

  cudaMalloc((void**)&din_image,(vsize*hsize)*sizeof(IMAGE_TYPE));
  cudaMalloc((void**)&dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE));
  cudaMemset(dout_image, 0, (vsize*hsize)*sizeof(IMAGE_TYPE));
  cudaMemcpy(din_image,hin_image, (vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyHostToDevice);

  dim3 threads(hsize,1,1);
  dim3 blocks(vsize,1,1);

  kernel<<<blocks,threads>>>(din_image,dout_image);

  cudaMemcpy(hout_image,dout_image,(vsize*hsize)*sizeof(IMAGE_TYPE),cudaMemcpyDeviceToHost);

  for(int i=0;i<10;i++) {
    printf("\n");
    for(int j=0;j<10;j++)
        printf("%c\t",hout_image[i][j]);
  }
  printf("\n");

  cudaFree(din_image);
  cudaFree(dout_image);

  free(hin_image);
  free(hout_image);

  return 0;
}

处理CUDA中指向Image的指针

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-04-24 07:00:33

处理CUDA中指向Image的指针

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-04-24 07:00:33

解决方案1
1 已采纳 2014-04-24 07:00:33