How to efficiently calculate the distance between a large set of points and an arbitrary function?

Question

I have a large set of points and a curve defined by a mathematical function f(x), both in the 2D cartesian space. Examples for f(x) include polynomials, but also common continuous functions like sin(x), log(x), exp(x). I would like to use python to create a function that determines the smallest distance between every point and the curve.

My current solution is as follows, with an arbitrary function to run as an example:

import time
import numpy as np
import sympy as sym
import math

def get_dist_to_curve(curve_func, points):
    t_start = time.time()
    distances = []
    for i, point in enumerate(points):
        point_x = point[0]
        point_y = point[1]

        #Create sympy expression for the dist between the point being evaluated
        # and the closest point in the curve 
        sqrd_dist = (x-point_x)**2 + (curve_func-point_y)**2
        sqrd_dist_dx = sym.diff(sqrd_dist,x)
        #Solve for x coord of point where the derivative of the dist is zero
        eq_solutions = sym.solve(sqrd_dist_dx, x)
        
        #Make sure the equation solution that results in the smallest dist is used 
        best_dist = math.inf
        best_x = None
        best_y = None
        for k in range(len(eq_solutions)):
            closest_x=sym.re(eq_solutions[k])
            closest_y = curve_func.evalf(subs={x: closest_x})

            dist = math.sqrt((closest_x-point_x)**2 +(closest_y-point_y)**2)
            if dist<best_dist:
                best_x = closest_x
                best_y = closest_y
                best_dist = dist
        
        distances.append(best_dist)
        
    t_end = time.time()
    print("Time elapsed: ", t_end-t_start)
    return distances


points = np.random.uniform(0, 10, (1000000,2))

x = sym.Symbol('x')
curve_func = x**2+5*x+10

get_dist_to_curve(curve_func, points)

This solution works, but takes very long to run when working with very large numbers of points (in working in the order of a million) and more complex functions. How can I improve my code to make it more efficient regarding total runtime?

Changing the logic behind the distance calculation or using new libraries is not a problem. The input should be kept an arbitrary mathematical function.

Answer 1

I guess that you are more concerned about speed than accuracy. In that case reducing the problem to something that can be solved by NumPy's np.roots function should be fine. You didn't say that you are only interested in polynomials but you're example is polynomial so I'm presuming that. If you want to handle something more complicated then you will need to carefully define the scope because this problem is impossible in general if curve_func can be any arbitrary function.

We use SymPy to find the general equation of the difference and then differentiate that to get the equation for the stationary points (minima and maxima). Then we use lambdify to efficiently generate the polynomial coefficients to be passed to np.roots . A post-processing step would be needed to figure out which root returned by np.roots is the real root that you want. You can then substitute that back to find the distance:

In [56]: from sympy import *

In [57]: x, y = symbols('x, y')

In [58]: x1, y1 = symbols('x1, y1')

In [59]: curve_func = x**2+5*x+10

In [60]: dist2 = (x1 - x)**2 + (y1 - curve_func)**2

In [61]: dist2
Out[61]: 
                                   2
         2   ⎛   2                ⎞ 
(-x + x₁)  + ⎝- x  - 5⋅x + y₁ - 10⎠ 

In [62]: coeffs = Tuple(*dist2.diff(x).as_poly(x).all_coeffs())

In [63]: coeffs
Out[63]: (4, 30, 92 - 4⋅y₁, -2⋅x₁ - 10⋅y₁ + 100)

In [64]: poly_func = lambdify((x1, y1), coeffs)

In [65]: poly_func(2, 3)
Out[65]: (4, 30, 80, 66)

In [66]: np.roots(_)
Out[66]: array([-3. +1.41421356j, -3. -1.41421356j, -1.5+0.j        ])

In [67]: dist2.subs(x, _[-1]).subs({x1:2, y1:3})
Out[67]: 15.3125000000000

We could also collect the outputs of np.roots into an array and use lambdify to evaluate that.

If you were more concerned about accuracy than speed then it might make sense to use SymPy's real_roots function rather than np.roots :

In [71]: real_roots(Poly(coeffs.subs({x1: 2, y1:3}), x))
Out[71]: [-3/2]

The timings here on this not powerful machine are about 1ms to find one distance (although it's probably faster in an intensive loop):

In [72]: %time poly_func(4, 5)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 19.3 µs
Out[72]: (4, 30, 72, 42)

In [73]: %time np.roots(_)
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 998 µs
Out[73]: 
array([-3.32468811+1.13592756j, -3.32468811-1.13592756j,
       -0.85062379+0.j        ])

How to efficiently calculate the distance between a large set of points and an arbitrary function?

Question

1 answers

solution1
1 ACCPTED 2022-06-18 00:54:16

How to efficiently calculate the distance between a large set of points and an arbitrary function?

Question

1 answers

solution1 1 ACCPTED 2022-06-18 00:54:16

solution1
1 ACCPTED 2022-06-18 00:54:16