霍夫变换：通过OpenCL提高算法效率

Question

I am trying to detect a circle in binary image using hough transform. 我正在尝试使用霍夫变换来检测二进制图像中的一个圆。

When I use Opencv's built-in function for the circular hough transform, it is OK and I can find the circle. 当我将Opencv的内置函数用于圆形霍夫变换时，可以，并且可以找到圆。

Now I try to write my own 'kernel' code for doing hough transform but is very very slow: 现在，我尝试编写自己的“内核”代码来进行霍夫变换，但是速度非常慢：

 kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough,__global int * circle)
 {
     sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
     int gid0 = get_global_id(0);
     int gid1 = get_global_id(1);
     uint4 pixel;
     int x0=0,y0=0,r;
     int maxval=0;
     pixel=read_imageui(imageIn,sampler,(int2)(gid0,gid1));
     if(pixel.x==255)
     {
     for(int r=20;r<150;r+=2)
     {
    // int r=100;

              for(int theta=0; theta<360;theta+=2)
              {

                              x0=(int) round(gid0-r*cos( (float) radians( (float) theta) ));
                            y0=(int) round(gid1-r*sin( (float) radians( (float) theta) ));
                           if((x0>0) && (x0<get_global_size(0)) && (y0>0)&&(y0<get_global_size(1)))
                            atom_inc(&in[w_hough*y0+x0]);
              }
              if(maxval<in[w_hough*y0+x0])
              {
              maxval=in[w_hough*y0+x0];
                circle[0]=gid0;
                circle[1]=gid1;
                circle[2]=r;
              }

              }

     }

 }

There are source codes for the hough opencl library with opencv, but its hard to me for extract a specific function that helps me. 有使用opencv的hough opencl库的源代码，但是很难提取有助于我的特定功能。

Can anyone offer a better source code example, or help me understand why this is so inefficient? 谁能提供一个更好的源代码示例，或者帮助我理解为什么这样效率低下？ the code main.cpp and kernel.cl compress in rar file http://www.files.com/set/527152684017e use opencv lib for read and display image > rar文件http://www.files.com/set/527152684017e中压缩的代码main.cpp和kernel.cl使用opencv lib读取和显示图像>

Answer 1

Making repeated calls to sin() and cos() is computationally expensive. 重复调用sin()和cos()在计算上很昂贵。 Since you only ever call these functions with the same 180 values of theta , you could speed things up by precalculating these values and storing them in an array. 由于您只能使用相同的180个theta值调用这些函数，因此可以通过预先计算这些值并将它们存储在数组中来加快处理速度。

A more robust approach would be to use the midpoint circle algorithm to find the perimeters of these circles by simple integer arithmetic. 一种更可靠的方法是使用中点圆算法通过简单的整数算法找到这些圆的周长。

Answer 2

What you are doing is running a huge CPU block of code in only 1 workitem, the results as expected, is a slowww kernel. 您正在做的是仅在1个工作项中运行一个巨大的CPU代码块，结果如预期的那样是一个slowww内核。

Detailed answer: The only place were you use the work-item ID is just for the pixel value, if that condition is met then you run a big chunck of code. 详细答案：您使用工作项目ID的唯一地方就是像素值，如果满足该条件，则需要运行大量代码。 Some of the work-items will trigger this some of them don't. 有些工作项会触发此操作，而有些则不会。 The ones that trigger it will make indirectly all the work group to run that code, and this will slow you down. 触发它的人将间接使所有工作组运行该代码，这会使您减速。

In addition, the workitems that don't enter that condition will be idle. 此外，未进入该条件的工作项将处于空闲状态。 Depending on the image maybe 99% of them are idle. 根据图像，可能有99％处于空闲状态。

I would rewrite your algorithm to use 1 workgroup per pixel. 我将重写您的算法以每个像素使用1个工作组。 If the condition is met the workgroup will run the algorithm, if it is not, the whole workgroup will skip. 如果满足条件，则工作组将运行该算法，否则将跳过整个工作组。 And in the case the workgroup enters the condition, you will have many workitems to play with. 并且在工作组进入条件的情况下，您将有许多工作项目要处理。 This will allow a redesign of the code such that the inner for loops run in parallel. 这将允许重新设计代码，以使内部for循环并行运行。

霍夫变换：通过OpenCL提高算法效率

问题描述

2 个解决方案

解决方案1
1 2013-10-29 20:13:28

解决方案2
0 2013-10-30 13:12:22

霍夫变换：通过OpenCL提高算法效率

问题描述

2 个解决方案

解决方案1 1 2013-10-29 20:13:28

解决方案2 0 2013-10-30 13:12:22

解决方案1
1 2013-10-29 20:13:28

解决方案2
0 2013-10-30 13:12:22