简体   繁体   English

以下代码用于计算所有向量对之间的距离有什么问题?

[英]What is wrong with the following code for computing distance between all pairs of vectors?

I'm trying to find manhattan distance between all pairs of vectors. 我试图找到所有向量对之间的曼哈顿距离。

import numpy as np
import itertools

class vector:
    def __init__(self):
        self.a = 0
        self.b = 0

c = vector()
d = vector()
l = vector()
m = vector()

e = [c,d]
n = [l,m]
o = np.array(n)
f = np.array(e)
p = itertools.product(o,f)
p = list(p)
def comp(x):
    return (x[0].a-x[1].a) + (x[0].b-x[1].b)

g = np.vectorize(comp)
print g(p)

I get the error: 我得到错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2207, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2270, in _vectorize_call
    ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2232, in _get_ufunc_and_otypes
    outputs = func(*inputs)
  File "<stdin>", line 2, in comp
AttributeError: vector instance has no attribute '__getitem__'

I have to say I'd approach this differently. 我不得不说我会采取不同的方法。 Numerical Python doesn't deal well with Python classes and such. 数值Python不能很好地处理Python类等。

Your class 你的班

class vector:
    def __init__(self):
        self.a = 0
        self.b = 0

Is basically a length-2 vector. 基本上是长度为2的向量。 So, if you're going to operate on many length-2 vectors, I'd suggest something like this: 因此,如果您要对许多长度为2的向量进行运算,则建议如下所示:

In [13]: p = np.array([[1, 2], [3, 4], [5, 6]])

In [14]: p
Out[14]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

Each row is a length-2 vector. 每行是一个长度为2的向量。 There are 3 such vectors. 有3个这样的向量。 This is far far far more efficient than a Python list of Python classes. 这远比Python的Python类list高效得多。

Now your comp function 现在你的comp函数

def comp(x):
    return (x[0].a-x[1].a) + (x[0].b-x[1].b)

is basically equivalent to 基本上相当于

def comp(x):
    return (x[0].a+x[0].b) - (x[1].a+x[1].b)

ie, the component sum of the first vector, minus the component sum of the second vector. 即,第一矢量的分量和减去第二矢量的分量和。 That being the case, you can efficiently calculate the pairwise outputs via 在这种情况下,您可以通过高效地计算成对输出

In [15]: q = p.sum(axis=1)

for calculating the component sum of each vector, followed by 用于计算每个矢量的分量和

In [16]: np.subtract.outer(q, q)
Out[16]: 
array([[ 0, -4, -8],
       [ 4,  0, -4],
       [ 8,  4,  0]])

The way you've written comp , it expects to be called with a two-tuple as an argument, but that's not what happens. 您编写comp的方式,期望使用两个元组作为参数来调用它,但是事实并非如此。 p is a list of tuples. p是一个元组列表。 When you call a vectorized function on it, it is converted to a numpy array. 当您在其上调用向量化函数时,它将转换为numpy数组。 The tuples are split into separate columns so you get a 4x2 array. 元组分为不同的列,因此您将获得一个4x2的数组。 Your function is then called on each cell of this array. 然后,在该数组的每个单元上调用您的函数。 So it gets called with just one vector object as an argument. 因此,仅使用一个矢量对象作为参数即可调用它。

It's not really clear what you're trying to accomplish here. 目前尚不清楚您要在这里完成什么。 If your objects are not numbers, you won't gain anything by using things like np.vectorize on them; 如果对象不是数字,则对它们使用诸如np.vectorize类的东西将不会获得任何np.vectorize you should just call your function in a loop. 您应该只是在循环中调用函数。 If your objects are numbers, then just store them in an ordinary numpy array, and make use of better ways to compute such distances, like the pdist function in scipy . 如果你的对象是数字,然后把它们存储在一个普通的numpy的阵列,并利用更好的方法来计算这样的距离,像pdist的功能scipy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM