简体   繁体   中英

alternate efficient way to compute distance instead of eucledian distance in knn algorithm

I have implemented knn algorithm and this is my function to calculate the Euclidian distance.

def euc_dist(self, train, test):
    return math.sqrt(((train[0] - test[0]) ** 2) + ((test[1] - train[1]) ** 2))

#
def euc_distance(self, test):
    eu_dist = []
    for i in range(len(test)):
        distance = [self.euc_dist(self.X_train[j], test[i]) for j in range(len(self.X_train))]
        eu_dist.insert(i, distance)


    return eu_dist

Is there any better efficient way to perform the distance calculation??

(1) Python loops are extremely slow. Learn to use array computations, eg numpy :

import numpy as np

x = np.array(...)
y = np.array(...)
distances = np.sqrt(np.sum((x-y)**2)) 

Batching the computations allows for efficient vectorized or even parallel implementations.

(2) If you don't need absolute distance values (eg you only compare their magnitude or average or normalize the result somehow), then omit square root operation, it is very slow. Omission is possible because sqrt is a monotonic function (ie omitting it preserves total order).

squared_distances = np.sum((x-y)**2)

(3) There may be distance definitions other than Euclidian that may be meaningful for your specific problem. You may try to find the definition that is simpler and faster, eg a simple subtraction or absolute error.

error = x-y
absolute_error = np.abs(x-y)

(4) In all cases, try and measure (profile). Do not rely on intuition when you deal with runtime performance optimization.

PS Code snippets above do not map to your code exactly (on purpose). It is up to you to learn how to adapt them. Hint: 2D arrays ;)

You can use squared distances (just remove math.sqrt - slow operation) if they are needed for comparisons only.

Possible optimization - if Python operation ((train[0] - test[0]) ** 2 uses powering through exponent, it is worth to change it to simple multiplication

def squared_euc_dist(self, train, test):
    x = train[0] - test[0]
    y = train[1] - test[1]
    return x * x + y * y

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM