简体   繁体   中英

OpenCL Ndrange Global Size/Local Size

I am trying to detect a circle in binary image using hough transform. the problem with local and global work size in NDrangekernel i dont know the optimise value needed for that the global_work_size put the value of dimension process image such 512*512 local_work_size when put value 1 or 8 or 16 its ok the program run ok but when change value to 32 or 64 the compile is ok and the program run faster for execute time but no result in output for accumulator in[]

the size of image 512*512
size_t szGlobalWorkSize[2]={img.cols,img.rows}; size_t szLocalWorkSize[2]={16,16};

     clEnqueueNDRangeKernel(clCommandQueue,hough_circle,2,NULL,szGlobalWorkSize,szLoc‌alWorkSize,0,NULL,&event);​ 

the kernel code is :

 kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough)
{
       sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE |         CLK_FILTER_NEAREST;
      int gid0 = get_global_id(0);
      int gid1 = get_global_id(1);
     uint4 pixel;
      pixel=read_imageui(imageIn,sampler,(int2)(gid0,gid1));
     if(pixel.x==255)
     {
   for(int r=90;r<110;r+=1)
        {
           for(int theta=0; theta<360;theta++)
              {
        x0=(int) round(gid0-r*sin_parameter[theta] );
        y0=(int) round(gid1-r*cos_parameter[theta] );
                if((x0>0) && (x0<get_global_size(0)) && (y0>0)&&(y0<get_global_size(1)))

                 atom_inc(&in[w_hough*y0+x0]);
               }
         }

     }

}

any help for select optimum value for global and local size

Two things:

  1. You can't make local_work_size arbitrarily large. Each dimension must be less than or equal to clGetDeviceInfo for CL_DEVICE_MAX_WORK_ITEM_SIZES, and the product of all dimensions must be less than or equal to clGetDeviceInfo for CL_DEVICE_MAX_WORK_GROUP_SIZE. This is 128 for some GPUs, so 16x16 is even too large for some hardware. 32x32 isn't going to work on most GPUs.

  2. If you specify the local_work_size, the global_work_size must be an integer multiple of the local_work_size (if you're on OpenCL 1.x).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM