简体   繁体   English

计算列表 python 列表中所有向量之间的欧几里得和闵可夫斯基距离的最快方法

[英]Fastest way to calculate Euclidean and Minkowski distance between all the vectors in a list of lists python

I have been trying for a while now to calculate the Euclidean and Minkowski distance between all the vectors in a list of lists.我一直在尝试计算列表列表中所有向量之间的欧几里得和闵可夫斯基距离。 I don't have much advanced mathematical knowledge.我没有太多高级的数学知识。

  • I am usually working with 4 or 5 dimension vectors我通常使用 4 或 5 维向量
  • The vector list can range in size from 0 to around 200,000向量列表的大小范围从 0 到大约 200,000
  • When calculating the distance all the vectors will have the same amount of dimensions在计算距离时,所有向量将具有相同的维度

I have relied on these two questions during the process:在这个过程中,我依赖了这两个问题:

python numpy euclidean distance calculation between matrices of row vectors python numpy 行向量矩阵之间的欧式距离计算

Calculate Euclidean Distance between all the elements in a list of lists python 计算列表 python 列表中所有元素之间的欧几里得距离

At first my code looked like this:起初我的代码看起来像这样:

import numpy as np

def euclidean_distance_np(vec_list, single_vec):
    dist = (np.array(vec_list) - single_vec) ** 2
    dist = np.sum(dist, axis=1)
    dist = np.sqrt(dist)
    return dist

def minkowski_distance_np(vec_list, single_vec, p_val):
    dist = (np.abs(np.array(vec_list, dtype=np.int64) - single_vec) ** p_val).sum(axis=1) ** (1 / p_val)
    return dist

This worked well when I had a small amount of vectors.当我有少量向量时,这很有效。 I would calculate the distance of a single vector to all the vectors in the list and repeat the process for every vector in the list one by one, but once the list became 5 or 6 digits in length, these functions became extremely slow.我会计算单个向量与列表中所有向量的距离,并为列表中的每个向量一个一个地重复该过程,但是一旦列表长度变为 5 或 6 位,这些函数就会变得非常慢。

I managed to improve the Euclidean distance calculation like so:我设法改进了欧几里得距离计算,如下所示:

x = np.array([v[0] for v in vec_list])
y = np.array([v[1] for v in vec_list])
z = np.array([v[2] for v in vec_list])
w = np.array([v[3] for v in vec_list])
t = np.array([v[4] for v in vec_list])

res = np.sqrt(np.square(x - x.reshape(-1,1)) + np.square(y - y.reshape(-1,1)) + np.square(z - z.reshape(-1,1)) + np.square(w - w.reshape(-1,1)) + np.square(t - t.reshape(-1,1)))

But cannot figure out how to implement the calculation method above to correctly calculate Minkowski distance.但无法弄清楚如何实现上述计算方法来正确计算闵可夫斯基距离。 So, to be precise, my question is how can I calculate Minkowski distance in a similar way to the code I mentioned above.所以,准确地说,我的问题是如何以与我上面提到的代码类似的方式计算 Minkowski 距离。

I would also appreciate any ideas for improvement or better ways to preform the calculations我也将不胜感激任何改进的想法或更好的方法来执行计算

Scipy has already implemented distance functions: minkowski , euclidean . Scipy 已经实现了距离函数: minkowskieuclidean But probably what you need is cdist .但可能你需要的是cdist

Numpy is great tool for matrices manipulation, but it doesn't contain all possible functions. Numpy 是矩阵操作的好工具,但它不包含所有可能的功能。 You can find most of additional features and operations in SciPy which is more related to mathematics, science, and engineering.您可以在SciPy中找到与数学、科学和工程更相关的大部分附加功能和操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM