简体   繁体   English

使用numpy vectorize在2D数组中仅对一个轴进行矢量化

[英]Vectorize over only one axis in a 2D array with numpy vectorize

I have the following function to get the Euclidean distance between two vectors a and b . 我具有以下函数来获取两个向量ab之间的欧几里得距离。

def distance_func(a,b):
    distance = np.linalg.norm(b-a)
    return distance

Here, I want a to be an element of an array of vectors. 在这里,我希望a成为向量数组的元素。 So I used numpy vectorize to iterate over the array. 所以我用numpy vectorize遍历数组。 (In order to get a better speed than iterating with a for loop) (为了获得比使用for循环迭代更好的速度)

vfunc = np.vectorize(distance_func)

I used this as follows to get an array of Euclidean distances 我如下使用它来获得欧几里得距离的数组

a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6]])
b = np.array([1,2])

vfunc(a,b)

But this function returns: 但是此函数返回:

array([[ 0., 0.], [ 1., 1.], [ 2., 2.], [ 3., 3.], [ 4., 4.]]) array([[0.,0.],[1.,1.],[2.,2.],[3.,3.],[4.,4.]])

This is the result of performing the operation np.linalg.norm(ab) individually for the second vector. 这是对第二个向量分别执行操作np.linalg.norm(ab)的结果。 How do I use numpy vectorize to get the array of Euclidean distance in this way? 如何以这种方式使用numpy vectorize获得欧几里得距离的数组?

You don't need to use vectorize , you can just do: 您无需使用vectorize ,只需执行以下操作:

a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6]])
b = np.array([1,2])

np.linalg.norm(a-b, axis=1)

which gives: 这使:

[ 0.          1.41421356  2.82842712  4.24264069  5.65685425]

(I assume this is what you want, but if not, please also show the result you expect for your example.) (我假设这是您想要的,但如果不是,请同时显示您期望的示例结果。)

If you want to compute euclidean distances between all of your data points, you should use one of the functions provided to this effect 如果要计算所有数据点之间的欧式距离,则应使用为此提供的功能之一

from sklearn.metrics import euclidean_distances
from scipy.spatial import distance_matrix

They are optimized to calculate the distances of several points a to several point b in a fully vectorized manner. 它们被优化以完全矢量化的方式计算几个点a几个b的距离。

import numpy as np
a = np.random.randn(100, 2)
b = np.random.randn(200, 2)

d1 = euclidean_distances(a, b)
d2 = distance_matrix(a, b, p=2)
print d1.shape  # yields (100, 200), one distance for each possible couple
print d2.shape

Speed considerations 速度注意事项

In [90]: %timeit d1 = euclidean_distances(a, b)
1000 loops, best of 3: 403 us per loop

In [91]: %timeit d2 = distance_matrix(a, b, p=2)
1000 loops, best of 3: 699 us per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM