简体   繁体   English

knn算法中计算距离而不是欧氏距离的另一种有效方法

[英]alternate efficient way to compute distance instead of eucledian distance in knn algorithm

I have implemented knn algorithm and this is my function to calculate the Euclidian distance. 我已经实现了knn算法,这是我计算欧几里得距离的函数。

def euc_dist(self, train, test):
    return math.sqrt(((train[0] - test[0]) ** 2) + ((test[1] - train[1]) ** 2))

#
def euc_distance(self, test):
    eu_dist = []
    for i in range(len(test)):
        distance = [self.euc_dist(self.X_train[j], test[i]) for j in range(len(self.X_train))]
        eu_dist.insert(i, distance)


    return eu_dist

Is there any better efficient way to perform the distance calculation?? 有没有更好的有效方法来执行距离计算?

(1) Python loops are extremely slow. (1)Python循环非常慢。 Learn to use array computations, eg numpy : 学习使用数组计算,例如numpy

import numpy as np

x = np.array(...)
y = np.array(...)
distances = np.sqrt(np.sum((x-y)**2)) 

Batching the computations allows for efficient vectorized or even parallel implementations. 批处理计算允许有效的矢量化甚至并行实现。

(2) If you don't need absolute distance values (eg you only compare their magnitude or average or normalize the result somehow), then omit square root operation, it is very slow. (2)如果不需要绝对距离值(例如,您仅比较它们的大小或取平均值或以某种方式归一化结果),则省略平方根运算,这会非常慢。 Omission is possible because sqrt is a monotonic function (ie omitting it preserves total order). 省略是可能的,因为sqrt是单调函数(即,省略它会保留总顺序)。

squared_distances = np.sum((x-y)**2)

(3) There may be distance definitions other than Euclidian that may be meaningful for your specific problem. (3)可能存在除Euclidian以外的距离定义,这对您的特定问题可能有意义。 You may try to find the definition that is simpler and faster, eg a simple subtraction or absolute error. 您可以尝试找到更简单,更快速的定义,例如简单的减法或绝对错误。

error = x-y
absolute_error = np.abs(x-y)

(4) In all cases, try and measure (profile). (4)在所有情况下,请尝试测量(轮廓)。 Do not rely on intuition when you deal with runtime performance optimization. 在处理运行时性能优化时,不要依赖直觉。

PS Code snippets above do not map to your code exactly (on purpose). 上面的PS代码段未完全(故意)映射到您的代码。 It is up to you to learn how to adapt them. 您需要学习如何适应它们。 Hint: 2D arrays ;) 提示:2D数组;)

You can use squared distances (just remove math.sqrt - slow operation) if they are needed for comparisons only. 如果仅用于比较,则可以使用平方距离(只需删除math.sqrt慢速操作)。

Possible optimization - if Python operation ((train[0] - test[0]) ** 2 uses powering through exponent, it is worth to change it to simple multiplication 可能的优化-如果Python操作((train[0] - test[0]) ** 2使用幂乘幂,则值得将其更改为简单乘法

def squared_euc_dist(self, train, test):
    x = train[0] - test[0]
    y = train[1] - test[1]
    return x * x + y * y

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM