有什么方法可以加快此Python代码的速度吗？

Question

I've written some Python code to do some image processing work, but it takes a huge amount of time to run. 我已经编写了一些Python代码来执行某些图像处理工作，但是运行需要大量时间。 I've spent the last few hours trying to optimize it, but I think I've reached the end of my abilities. 我花了最后几个小时尝试对其进行优化，但是我认为我已经达到了极限。

Looking at the outputs from the profiler, the function below is taking a large proportion of the overall time of my code. 从探查器的输出看，下面的函数在我的代码的总时间中占了很大的比例。 Is there any way that it can be speeded up? 有什么办法可以加快速度吗？

def make_ellipse(x, x0, y, y0, theta, a, b):
    c = np.cos(theta)
    s = np.sin(theta)
    a2 = a**2
    b2 = b**2
    xnew = x - x0
    ynew = y - y0
    ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1

    return ellipse

To give the context, it is called with x and y as the output from np.meshgrid with a fairly large grid size, and all of the other parameters as simple integer values. 为了给出上下文，使用相当大的网格大小的np.meshgrid的输出调用x和y ，并将所有其他参数作为简单的整数值进行调用。

Although that function seems to be taking a lot of the time, there are probably ways that the rest of the code can be speeded up too. 尽管该功能似乎要花费很多时间，但可能还有一些方法可以加快其余代码的速度。 I've put the rest of the code at this gist . 我把其余的代码放在了这个要点上。

Any ideas would be gratefully received. 任何想法将不胜感激。 I've tried using numba and autojit ing the main functions, but that doesn't help much. 我尝试过使用numba并自动autojit主要功能，但这并没有多大帮助。

Answer 1

Let's try to optimize make_ellipse in conjunction with its caller. 让我们尝试结合其调用方优化make_ellipse。

First, notice that a and b are the same over many calls. 首先，请注意，在多次调用中， a和b是相同的。 Since make_ellipse squares them each time, just have the caller do that instead. 由于make_ellipse每次都会对它们进行平方，因此只需让调用者执行该操作即可。

Second, notice that np.cos(np.arctan(theta)) is 1 / np.sqrt(1 + theta**2) which seems slightly faster on my system. 其次，请注意np.cos(np.arctan(theta))是1 / np.sqrt(1 + theta**2) ，在我的系统上看起来稍快一些。 A similar trick can be used to compute the sine, either from theta or from cos(theta) (or vice versa). 类似的技巧可用于计算正弦，无论是从theta还是从cos（theta）（或反之亦然）。

Third, and less concretely, think about short-circuiting some of the final ellipse formula evaluations. 第三，不太具体，考虑将一些最终的椭圆公式评估短路。 For example, wherever (xnew * c + ynew * s)**2/a2 is greater than 1, the ellipse value must be False. 例如，无论何处(xnew * c + ynew * s)**2/a2大于1，椭圆值必须为False。 If this happens often, you can "mask" out the second half of the (expensive) calculation of the ellipse at those locations. 如果经常发生这种情况，您可以“屏蔽”这些位置的椭圆（昂贵）计算的后半部分。 I haven't planned this thoroughly, but see numpy.ma for some possible leads. 我没有彻底地计划这个，但是看一些可能的潜在客户numpy.ma。

Answer 2

It won't speed up things for all cases, but if your ellipses don't take up the whole image, you should limit your search for points inside the ellipse to its bounding rectangle. 它不会在所有情况下都加快处理速度，但是如果您的椭圆不能占据整个图像，则应将对椭圆内点的搜索限制在其边界矩形内。 I am lazy with the math, so I googled it and reused @JohnZwinck neat cosine of an arctangent trick to come up with this function: 我对数学很懒，所以我用它搜索了一下，并重复使用了一个反正切技巧的@JohnZwinck整齐余弦来得出这个函数：

def ellipse_bounding_box(x0, y0, theta, a, b):
    x_tan_t = -b * np.tan(theta) /  a
    if np.isinf(x_tan_t) :
        x_cos_t = 0
        x_sin_t = np.sign(x_tan_t)
    else :
        x_cos_t = 1 / np.sqrt(1 + x_tan_t*x_tan_t)
        x_sin_t = x_tan_t * x_cos_t
    x = x0 + a*x_cos_t*np.cos(theta) - b*x_sin_t*np.sin(theta)

    y_tan_t = b / np.tan(theta) /  a
    if np.isinf(y_tan_t):
        y_cos_t = 0
        y_sin_t = np.sign(y_tan_t)
    else:
        y_cos_t = 1 / np.sqrt(1 + y_tan_t*y_tan_t)
        y_sin_t = y_tan_t * y_cos_t
    y = y0 + b*y_sin_t*np.cos(theta) + a*y_cos_t*np.sin(theta)

    return np.sort([-x, x]), np.sort([-y, y])

You can now modify your original function to something like this: 现在，您可以将原始功能修改为以下形式：

def make_ellipse(x, x0, y, y0, theta, a, b):
    c = np.cos(theta)
    s = np.sin(theta)
    a2 = a**2
    b2 = b**2
    x_box, y_box = ellipse_bounding_box(x0, y0, theta, a, b)
    indices = ((x >= x_box[0]) & (x <= x_box[1]) & 
               (y >= y_box[0]) & (y <= y_box[1]))
    xnew = x[indices] - x0
    ynew = y[indices] - y0
    ellipse = np.zeros_like(x, dtype=np.bool)
    ellipse[indices] = ((xnew * c + ynew * s)**2/a2 +
                        (xnew * s - ynew * c)**2/b2 <= 1)
    return ellipse

Answer 3

Since everything but x and y are integers, you can try to minimize the number of array computations. 由于除x和y之外的所有内容都是整数，因此可以尝试最小化数组计算的数量。 I imagine most of the time is spent in this statement: 我想大部分时间都花在这个声明上：

ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1

A simple rewriting like so should reduce the number of array operations: 像这样的简单重写应该减少数组操作的数量：

a = float(a)
b = float(b)
ellipse = (xnew * (c/a) + ynew * (s/a))**2 + (xnew * (s/b) - ynew * (c/b))**2 <= 1

What was 12 array operations is now 10 (plus 4 scalar ops). 现在12阵列操作现在是10（加上4标量运算）。 I'm not sure if numba's jit would have tried this. 我不确定numba的jit是否会试过这个。 It might just do all the broadcasting first, then jit the resulting operations. 它可能只是先进行所有广播，然后再执行结果操作。 In this case, reordering so common operations are done at once should help. 在这种情况下，重新排序以便立即执行常见操作应该会有所帮助。

Furthering along, you can rewrite this again as 继续，你可以再次重写

ellipse = ((xnew + ynew * (s/c)) * (c/a))**2 + ((xnew * (s/c) - ynew) * (c/b))**2 <= 1

Or 要么

t = numpy.tan(theta)
ellipse = ((xnew + ynew * t) * (b/a))**2 + (xnew * t - ynew)**2 <= (b/c)**2

Replacing one more array operation with a scalar, and eliminating other scalar ops to get 9 array operations and 2 scalar ops. 用标量替换另一个数组操作，并消除其他标量操作以获得9个数组操作和2个标量操作。

As always, be aware of what the range of inputs are to avoid rounding errors. 与往常一样，请注意输入范围以避免舍入错误。

Unfortunately there's no way good way to do a running sum and bail early if either of the two addends is greater than the right hand side of the comparison. 不幸的是，如果两个加数中的任何一个大于比较的右侧，那么就没有办法做好运行总和并提前保释。 That would be an obvious speed-up, but one you'd need cython (or c/c++) to code. 这将是一个明显的加速，但你需要cython（或c / c ++）来编码。

Answer 4

You can speed it up considerably by using Cython. 使用Cython可以大大加快速度。 There is a very good documentation on how to do this. 有关如何执行此操作的非常好的文档。

有什么方法可以加快此Python代码的速度吗？

问题描述

4 个解决方案

解决方案1
3 已采纳 2013-07-03 15:01:51

解决方案2
2 2013-07-03 16:08:04

解决方案3
2 2013-07-03 18:01:36

解决方案4
0 2013-07-03 14:29:44

有什么方法可以加快此Python代码的速度吗？

问题描述

4 个解决方案

解决方案1 3 已采纳 2013-07-03 15:01:51

解决方案2 2 2013-07-03 16:08:04

解决方案3 2 2013-07-03 18:01:36

解决方案4 0 2013-07-03 14:29:44

解决方案1
3 已采纳 2013-07-03 15:01:51

解决方案2
2 2013-07-03 16:08:04

解决方案3
2 2013-07-03 18:01:36

解决方案4
0 2013-07-03 14:29:44