简体   繁体   中英

Any way to speed up this Python code?

I've written some Python code to do some image processing work, but it takes a huge amount of time to run. I've spent the last few hours trying to optimize it, but I think I've reached the end of my abilities.

Looking at the outputs from the profiler, the function below is taking a large proportion of the overall time of my code. Is there any way that it can be speeded up?

def make_ellipse(x, x0, y, y0, theta, a, b):
    c = np.cos(theta)
    s = np.sin(theta)
    a2 = a**2
    b2 = b**2
    xnew = x - x0
    ynew = y - y0
    ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1

    return ellipse

To give the context, it is called with x and y as the output from np.meshgrid with a fairly large grid size, and all of the other parameters as simple integer values.

Although that function seems to be taking a lot of the time, there are probably ways that the rest of the code can be speeded up too. I've put the rest of the code at this gist .

Any ideas would be gratefully received. I've tried using numba and autojit ing the main functions, but that doesn't help much.

Let's try to optimize make_ellipse in conjunction with its caller.

First, notice that a and b are the same over many calls. Since make_ellipse squares them each time, just have the caller do that instead.

Second, notice that np.cos(np.arctan(theta)) is 1 / np.sqrt(1 + theta**2) which seems slightly faster on my system. A similar trick can be used to compute the sine, either from theta or from cos(theta) (or vice versa).

Third, and less concretely, think about short-circuiting some of the final ellipse formula evaluations. For example, wherever (xnew * c + ynew * s)**2/a2 is greater than 1, the ellipse value must be False. If this happens often, you can "mask" out the second half of the (expensive) calculation of the ellipse at those locations. I haven't planned this thoroughly, but see numpy.ma for some possible leads.

It won't speed up things for all cases, but if your ellipses don't take up the whole image, you should limit your search for points inside the ellipse to its bounding rectangle. I am lazy with the math, so I googled it and reused @JohnZwinck neat cosine of an arctangent trick to come up with this function:

def ellipse_bounding_box(x0, y0, theta, a, b):
    x_tan_t = -b * np.tan(theta) /  a
    if np.isinf(x_tan_t) :
        x_cos_t = 0
        x_sin_t = np.sign(x_tan_t)
    else :
        x_cos_t = 1 / np.sqrt(1 + x_tan_t*x_tan_t)
        x_sin_t = x_tan_t * x_cos_t
    x = x0 + a*x_cos_t*np.cos(theta) - b*x_sin_t*np.sin(theta)

    y_tan_t = b / np.tan(theta) /  a
    if np.isinf(y_tan_t):
        y_cos_t = 0
        y_sin_t = np.sign(y_tan_t)
    else:
        y_cos_t = 1 / np.sqrt(1 + y_tan_t*y_tan_t)
        y_sin_t = y_tan_t * y_cos_t
    y = y0 + b*y_sin_t*np.cos(theta) + a*y_cos_t*np.sin(theta)

    return np.sort([-x, x]), np.sort([-y, y])

You can now modify your original function to something like this:

def make_ellipse(x, x0, y, y0, theta, a, b):
    c = np.cos(theta)
    s = np.sin(theta)
    a2 = a**2
    b2 = b**2
    x_box, y_box = ellipse_bounding_box(x0, y0, theta, a, b)
    indices = ((x >= x_box[0]) & (x <= x_box[1]) & 
               (y >= y_box[0]) & (y <= y_box[1]))
    xnew = x[indices] - x0
    ynew = y[indices] - y0
    ellipse = np.zeros_like(x, dtype=np.bool)
    ellipse[indices] = ((xnew * c + ynew * s)**2/a2 +
                        (xnew * s - ynew * c)**2/b2 <= 1)
    return ellipse

Since everything but x and y are integers, you can try to minimize the number of array computations. I imagine most of the time is spent in this statement:

ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1

A simple rewriting like so should reduce the number of array operations:

a = float(a)
b = float(b)
ellipse = (xnew * (c/a) + ynew * (s/a))**2 + (xnew * (s/b) - ynew * (c/b))**2 <= 1

What was 12 array operations is now 10 (plus 4 scalar ops). I'm not sure if numba's jit would have tried this. It might just do all the broadcasting first, then jit the resulting operations. In this case, reordering so common operations are done at once should help.

Furthering along, you can rewrite this again as

ellipse = ((xnew + ynew * (s/c)) * (c/a))**2 + ((xnew * (s/c) - ynew) * (c/b))**2 <= 1

Or

t = numpy.tan(theta)
ellipse = ((xnew + ynew * t) * (b/a))**2 + (xnew * t - ynew)**2 <= (b/c)**2

Replacing one more array operation with a scalar, and eliminating other scalar ops to get 9 array operations and 2 scalar ops.

As always, be aware of what the range of inputs are to avoid rounding errors.

Unfortunately there's no way good way to do a running sum and bail early if either of the two addends is greater than the right hand side of the comparison. That would be an obvious speed-up, but one you'd need cython (or c/c++) to code.

You can speed it up considerably by using Cython. There is a very good documentation on how to do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM