简体   繁体   中英

OpenCL access shared local memory

I have written one piece of OpenCL kernel program to apply low pass filter to an image. The kernel:

__kernel void applyLowPassFilter(__global int *image, __global int *rst,
                                 __local int *localMem) {
  int nCols = get_global_size(0); // width of image
  int nRows = get_global_size(1); // height of image

  int xg = get_global_id(0); // x index of global buffer
  int yg = get_global_id(1); // y index od global buffer

  int xl = get_local_id(0); // x index of local buffer

  localMem[xl] = image[yg * nCols + xg];
  barrier(CLK_LOCAL_MEM_FENCE);
  if (yg != 0) {
    rst[yg * nCols + xg] = (localMem[xl] + image[(yg - 1) * nCols + xg]) / 2;
  }
}

In the kernel code, I would like to access the local memory of each workgroup and compute the value. So I set the global item size to W*H (W: width of the image, H: height of the image) and local item size to W*1, I'm expecting the group size to be W and the number of group size to be H here. Host code:

    size_t globalItemSize[2];
    size_t localItemSize[2];
    globalItemSize[0] = W;
    globalItemSize[1] = H;
    localItemSize[0] = W;
    localItemSize[1] = 1;
    // Set cl kernel arguments.
    ret = clSetKernelArg(clKernel, 0, sizeof(cl_mem), (void *)&imageObj);
    ret = clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&rstObj);
    ret = clSetKernelArg(clKernel, 2, sizeof(int) * localItemSize[0], NULL); // local mem

However, the code doesn't work and keeps giving me a result image of zeros. After experiment I found that it works by only using global memory and not accessing local memory. Did I do anything wrong with the code accessing local memory?

I figured it out by monitoring the OpenCL return error codes . Firstly I got -48 CL_INVALID_KERNEL error code after clSetKernelArg calls. It's very suspicious that something is wrong with my kernel. Then I dropped the third parameter which I passed to kernel for local memory accessing and used the __local statement in the kernel code instead. At this point, I got -51 CL_INVALID_ARG_SIZE error code which reminded me of checking the local work item number limitation of my hardware using the clinfo command. Realized the limitation of local item size, I changed the localItemSize in dimension 0 from W to W/3 . Then it worked.

Kernel code after modification:

__kernel void applyLowPassFilter(__global int *image, __global int *rst) {
  int nCols = get_global_size(0); // width of image
  int nRows = get_global_size(1); // height of image

  int xg = get_global_id(0); // x index of global buffer
  int yg = get_global_id(1); // y index od global buffer

  int xl = get_local_id(0); // x index of local buffer

  __local int localMem[212]; // 1/3 of image width
  localMem[xl] = image[yg * nCols + xg];
  barrier(CLK_LOCAL_MEM_FENCE);
  if (yg != 0) {
    rst[yg * nCols + xg] = (localMem[xl] + image[(yg - 1) * nCols + xg]) / 2;
  }
}

Parameters config in the host code:

    size_t globalItemSize[2];
    size_t localItemSize[2];
    globalItemSize[0] = W;
    globalItemSize[1] = H;
    localItemSize[0] = W / 3;
    localItemSize[1] = 1;
    // Set cl kernel arguments.
    ret = clSetKernelArg(clKernel, 0, sizeof(cl_mem), (void *)&imageObj);
    ret = clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&rstObj);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM