I have written one piece of OpenCL kernel program to apply low pass filter to an image. The kernel:
__kernel void applyLowPassFilter(__global int *image, __global int *rst,
__local int *localMem) {
int nCols = get_global_size(0); // width of image
int nRows = get_global_size(1); // height of image
int xg = get_global_id(0); // x index of global buffer
int yg = get_global_id(1); // y index od global buffer
int xl = get_local_id(0); // x index of local buffer
localMem[xl] = image[yg * nCols + xg];
barrier(CLK_LOCAL_MEM_FENCE);
if (yg != 0) {
rst[yg * nCols + xg] = (localMem[xl] + image[(yg - 1) * nCols + xg]) / 2;
}
}
In the kernel code, I would like to access the local memory of each workgroup and compute the value. So I set the global item size to W*H (W: width of the image, H: height of the image) and local item size to W*1, I'm expecting the group size to be W and the number of group size to be H here. Host code:
size_t globalItemSize[2];
size_t localItemSize[2];
globalItemSize[0] = W;
globalItemSize[1] = H;
localItemSize[0] = W;
localItemSize[1] = 1;
// Set cl kernel arguments.
ret = clSetKernelArg(clKernel, 0, sizeof(cl_mem), (void *)&imageObj);
ret = clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&rstObj);
ret = clSetKernelArg(clKernel, 2, sizeof(int) * localItemSize[0], NULL); // local mem
However, the code doesn't work and keeps giving me a result image of zeros. After experiment I found that it works by only using global memory and not accessing local memory. Did I do anything wrong with the code accessing local memory?
I figured it out by monitoring the OpenCL return error codes . Firstly I got -48 CL_INVALID_KERNEL
error code after clSetKernelArg
calls. It's very suspicious that something is wrong with my kernel. Then I dropped the third parameter which I passed to kernel for local memory accessing and used the __local
statement in the kernel code instead. At this point, I got -51 CL_INVALID_ARG_SIZE
error code which reminded me of checking the local work item number limitation of my hardware using the clinfo
command. Realized the limitation of local item size, I changed the localItemSize
in dimension 0 from W
to W/3
. Then it worked.
Kernel code after modification:
__kernel void applyLowPassFilter(__global int *image, __global int *rst) {
int nCols = get_global_size(0); // width of image
int nRows = get_global_size(1); // height of image
int xg = get_global_id(0); // x index of global buffer
int yg = get_global_id(1); // y index od global buffer
int xl = get_local_id(0); // x index of local buffer
__local int localMem[212]; // 1/3 of image width
localMem[xl] = image[yg * nCols + xg];
barrier(CLK_LOCAL_MEM_FENCE);
if (yg != 0) {
rst[yg * nCols + xg] = (localMem[xl] + image[(yg - 1) * nCols + xg]) / 2;
}
}
Parameters config in the host code:
size_t globalItemSize[2];
size_t localItemSize[2];
globalItemSize[0] = W;
globalItemSize[1] = H;
localItemSize[0] = W / 3;
localItemSize[1] = 1;
// Set cl kernel arguments.
ret = clSetKernelArg(clKernel, 0, sizeof(cl_mem), (void *)&imageObj);
ret = clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&rstObj);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.