更有效的计算numpy距离的方法？

Question

i have a question on how to calculate distances in numpy as fast as it can, 我有一个关于如何尽可能快地计算numpy距离的问题，

def getR1(VVm,VVs,HHm,HHs):
    t0=time.time()
    R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
    R*=R
    R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
    R1*=R1
    R+=R1
    del R1
    print "R1\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500) 
    print numpy.max(R) #4176.26290975
    # uses 17.5Gb ram
    return R


def getR2(VVm,VVs,HHm,HHs):
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
    #print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
    R = numpy.einsum('ijk,ijk->ij', deltas, deltas)
    print "R2\t",time.time()-t0,R.shape, #14.5291359425 (108225, 10500)
    print numpy.max(R) #4176.26290975
    # uses 26Gb ram
    return R


def getR3(VVm,VVs,HHm,HHs):
    from numpy.core.umath_tests import inner1d
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
    #print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
    R = inner1d(deltas, deltas)
    print "R3\t",time.time()-t0, R.shape, #12.6972110271 (108225, 10500)
    print numpy.max(R) #4176.26290975
    #Uses 26Gb
    return R


def getR4(VVm,VVs,HHm,HHs):
    from scipy.spatial.distance import cdist
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    R=spdist.cdist(precomputed_flat,measured_flat, 'sqeuclidean') #.T
    print "R4\t",time.time()-t0, R.shape, #17.7022118568 (108225, 10500)
    print numpy.max(R) #4176.26290975
    # uses 9 Gb ram
    return R

def getR5(VVm,VVs,HHm,HHs):
    from scipy.spatial.distance import cdist
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    R=spdist.cdist(precomputed_flat,measured_flat, 'euclidean') #.T
    print "R5\t",time.time()-t0, R.shape, #15.6070930958 (108225, 10500)
    print numpy.max(R) #64.6240118667
    # uses only 9 Gb ram
    return R

def getR6(VVm,VVs,HHm,HHs):
    from scipy.weave import blitz
    t0=time.time()
    R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
    blitz("R=R*R") # R*=R
    R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
    blitz("R1=R1*R1") # R1*=R1
    blitz("R=R+R1") # R+=R1
    del R1
    print "R6\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500) 
    print numpy.max(R) #4176.26290975
    return R

results in the following times: 导致以下时间：

R1  11.7737319469 (108225, 10500) 4909.66881791
R2  15.1279799938 (108225, 10500) 4909.66881791
R3  12.7408981323 (108225, 10500) 4909.66881791
R4  17.3336868286 (10500, 108225) 4909.66881791
R5  15.7530870438 (10500, 108225) 70.0690289494
R6  11.670968771 (108225, 10500) 4909.66881791

While the last one gives sqrt((VVm-VVs)^2+(HHm-HHs)^2), while the others give (VVm-VVs)^2+(HHm-HHs)^2, This is not really important, since otherwise further on in my code i take the minimum of R[i,:] for each i, and sqrt doesnt influence the minimum value anyways, (and if i am interested in the distance, i just take sqrt(value), instead of doing the sqrt over the entire array, so there is really no timing difference due to that. 虽然最后一个给出sqrt（（VVm-VVs）^ 2 +（HHm-HHs）^ 2），而其他给出（VVm-VVs）^ 2 +（HHm-HHs）^ 2，这不是很重要，因为在我的代码中另外进一步，我为每个i取最小值R [i，：]，并且sqrt不会影响最小值，（如果我对距离感兴趣，我只需要取sqrt（值），而不是在整个阵列上执行sqrt，因此实际上没有时间差异。

The question remains: how come the first solution is the best, (the reason the second and third are slower is because deltas=... takes 5.8seconds, (which is also why those two methods take 26Gb)), And why is the sqeuclidean slower than the euclidean? 问题仍然存在：为什么第一个解决方案是最好的，（第二个和第三个解决方案速度较慢的原因是因为增量= ...需要5.8秒，（这也是为什么这两个方法需要26Gb）），为什么sqeuclidean比欧几里德慢吗？

sqeuclidean should just do (VVm-VVs)^2+(HHm-HHs)^2, while i think it does something different. sqeuclidean应该做（VVm-VVs）^ 2 +（HHm-HHs）^ 2，而我认为它做了不同的事情。 Anyone know how to find the sourcecode (C or whatever is at the bottom) of that method? 任何人都知道如何找到该方法的源代码（C或底部的任何内容）？ I think it does sqrt((VVm-VVs)^2+(HHm-HHs)^2)^2 (the only reason i can think why it would be slower than (VVm-VVs)^2+(HHm-HHs)^2 - I know its a stupid reason, anyone got a more logical one?) 我认为它确实是sqrt（（VVm-VVs）^ 2 +（HHm-HHs）^ 2）^ 2（唯一的原因我能想到为什么它会慢于（VVm-VVs）^ 2 +（HHm-HHs） ^ 2 - 我知道这是一个愚蠢的原因，任何人都有一个更合乎逻辑的原因？）

Since i know nothing of C, how would i inline this with scipy.weave? 既然我对C一无所知，我怎么用scipy.weave来内联呢？ and is that code compilable normally like you do with python? 这个代码是否可以编译，就像你使用python一样？ or do i need special stuff installed for that? 或者我需要安装特殊的东西吗？

Edit: ok, i tried it with scipy.weave.blitz, (R6 method), and that is slightly faster, but i assume someone who knows more C than me can still improve this speed? 编辑：好吧，我尝试使用scipy.weave.blitz，（R6方法），这稍微快一点，但我认为有人知道比我更多的C仍然可以提高这个速度？ I just took the lines which are of the form a+=b or *=, and looked up how they would be in C, and put them in the blitz statement, but i guess if i put lines with the statements with flatten and newaxis in C as well, that it should go faster too, but i dont know how i can do that (someone who knows C maybe explain?). 我只是采用了形式为+ = b或* =的行，然后查看了它们在C中的含义，并将它们放在闪电战语句中，但我想如果我将行与flatten和newaxis放在一起C也是，它应该更快，但我不知道我怎么能这样做（知道C的人可能会解释？）。 Right now, the difference between the stuff with blitz and my first method are not big enough to really be caused by C vs numpy i guess? 现在，闪电战和我的第一种方法之间的差异不足以真正由C vs numpy引起我猜？

I guess the other methods like with deltas=... can go much faster too, when i would put it in C ? 我猜其他方法，比如deltas = ...也可以更快，当我把它放在C？

Answer 1

Whenever you have multiplications and sums, try to use one of the dot product functions or np.einsum . 每当你有乘法和求和时，尝试使用点积函数或np.einsum 。 Since you are preallocating your arrays, rather than having different arrays for horizontal and vertical coordinates, stack them both together: 由于您要预先分配数组，而不是为水平和垂直坐标设置不同的数组，因此将它们堆叠在一起：

precomputed_flat = np.column_stack((svf.flatten(), shf.flatten()))
measured_flat = np.column_stack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat - measured_flat[:, None, :]

From here, the simplest would be: 从这里开始，最简单的是：

dist = np.einsum('ijk,ijk->ij', deltas, deltas)

You could also try something like: 您也可以尝试以下方法：

from numpy.core.umath_tests import inner1d
dist = inner1d(deltas, deltas)

There is of course also SciPy's spatial module cdist : 当然还有SciPy的空间模块cdist ：

from scipy.spatial.distance import cdist
dist = cdist(precomputed_flat, measured_flat, 'euclidean')

EDIT I cannot run tests on such a large dataset, but these timings are rather enlightening: 编辑我无法在如此大的数据集上运行测试，但这些时间相当有启发性：

len_a, len_b = 10000, 1000

a = np.random.rand(2, len_a)
b =  np.random.rand(2, len_b)
c = np.random.rand(len_a, 2)
d = np.random.rand(len_b, 2)

In [3]: %timeit a[:, None, :] - b[..., None]
10 loops, best of 3: 76.7 ms per loop

In [4]: %timeit c[:, None, :] - d
1 loops, best of 3: 221 ms per loop

For the above smaller dataset, I can get a slight speed up over your method with scipy.spatial.distance.cdist and match it with inner1d , by arranging data differently in memory: 对于上面较小的数据集，我可以使用scipy.spatial.distance.cdist稍微加快你的方法，并通过在内存中以不同方式排列数据来匹配inner1d ：

precomputed_flat = np.vstack((svf.flatten(), shf.flatten()))
measured_flat = np.vstack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat[:, None, :] - measured_flat

import scipy.spatial.distance as spdist
from numpy.core.umath_tests import inner1d

In [13]: %timeit r0 = a[0, None, :] - b[0, :, None]; r1 = a[1, None, :] - b[1, :, None]; r0 *= r0; r1 *= r1; r0 += r1
10 loops, best of 3: 146 ms per loop

In [14]: %timeit deltas = (a[:, None, :] - b[..., None]).T; inner1d(deltas, deltas)
10 loops, best of 3: 145 ms per loop

In [15]: %timeit spdist.cdist(a.T, b.T)
10 loops, best of 3: 124 ms per loop

In [16]: %timeit deltas = a[:, None, :] - b[..., None]; np.einsum('ijk,ijk->jk', deltas, deltas)
10 loops, best of 3: 163 ms per loop

更有效的计算numpy距离的方法？

问题描述

1 个解决方案

解决方案1
6 已采纳 2013-07-08 14:05:52

更有效的计算numpy距离的方法？

问题描述

1 个解决方案

解决方案1 6 已采纳 2013-07-08 14:05:52

解决方案1
6 已采纳 2013-07-08 14:05:52