[英]Compare the Euclidean distance of two lists of tuples with fewer comparisons python

I am trying to calculate the Euclidean distance of two list of tuples with the threshold of 4 . 我正在尝试计算阈值为4的两个元组列表的欧几里得距离。 If the threshold is less than a particular value then increment the counter . 如果阈值小于特定值,则增加计数器。 Each tuple is the x,y,z coordinate of the point . 每个元组是该点的x,y,z坐标。 Is there anyway i can lower of the comparison of list1 with list2 .. 无论如何,我可以降低list1与list2的比较。

  X = [ (1,2,3),(2,3,4), (4,5,6) ]
  Y = [ (1,2,2) , (3,4,5),(6,7,8) ]
  from math import sqrt
  dist_X = [ sqrt((p[0] - 0)**2 + (p[1] - 0)**2 + (p[2] - 0)**2) for p in X]
  dist_Y = [ sqrt((p[0] - 0)**2 + (p[1] - 0)**2 + (p[2] - 0)**2) for p in Y]
  for x in dist_X:
     print (x ,  [ i for i,y in enumerate(dist_Y) if abs(x-y) <= 4])

I was thinking of first calculating the Euclidean distance of each point with the origin (0,0,0) so that both the lists now contain the points that are close to each other but it didnt work because its a scalar value .. Am i going in the right direction? 我正在考虑先计算具有原点(0,0,0)的每个点的欧几里得距离,以便现在两个列表都包含彼此靠近的点,但是由于其标量值而无法工作。朝着正确的方向前进?


   visited1 = [ (1,2,3),(2,3,4), (4,5,6) ]
   visited2   = [ (1,2,2) , (3,4,5),(6,7,8) ]
    def euclidean(a,b):
        return sqrt((a[0] - b[0])**2+(a[1]-b[1])**2+(a[2]-b[2])**2)
   comparison = 0
   for i,j in enumerate(visited2):
     for k,l in enumerate(visited1):
         if euclidean(visited2[i],visited1[k]) < 4:
                 count += 1
         comparison += 1

In this every element of the list1 is compared with every element in the list2 .. I want to know if there is a way i can minimise the comparisons given the points(x,y,z) that i have? 在这种情况下,将list1的每个元素与list2的每个元素进行比较..我想知道是否有一种方法可以在给定我拥有points(x,y,z)的情况下最小化比较?

One precomputatiion that can sometimes speed things up are kd-trees . 一种有时会加快速度的预计算是kd树 I ran a quick test against brute force and found that they can be quite a bit faster for larger lists: 我针对蛮力进行了快速测试,发现对于较大的列表,它们可以更快一些:

# n = 10
# trees                 0.08512560 ms
# brute                 0.01425540 ms
# n = 100
# trees                 0.20338160 ms
# brute                 0.09876890 ms
# n = 1000
# trees                 6.40193820 ms
# brute                16.15429670 ms
# n = 10000
# trees               298.69653380 ms
# brute              1393.71134270 ms

Code: 码:

import numpy as np
from scipy.spatial import cKDTree

import types
from timeit import timeit

def setup_data(n, k):
    data = {'d1': np.random.randint(0, 10, (n, 3)),
            'd2': np.random.randint(0, 10, (n, 3)),
            'mx': k}
    return data

def f_trees(d1, d2, mx):
    t1 = cKDTree(d1)
    t2 = cKDTree(d2)
    return t1.count_neighbors(t2, mx)

def f_brute(d1, d2, mx):
    dist2 = np.add.outer(np.einsum('ij,ij->i', d1, d1), np.einsum('ij,ij->i', d2, d2)) - 2*np.einsum('ij, kj', d1, d2)
    return np.count_nonzero(dist2 <= mx*mx)

for n in (10, 100, 1000, 10000):
    data = setup_data(n, 4)
    ref = np.array(f_trees(**data))
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            assert np.allclose(ref, func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
            print("{:16s} apparently failed".format(name[2:]))

Draw this out in 2D to see how it looks. 以2D方式绘制此图形以查看其外观。 Points that are roughly the same distance from the origin are not necessarily close to each other -- you have the triangle inequality backwards, there. 这点大概是从原点相同的距离不一定接近对方-你有三角不等式倒退,在那里。 For instance, compare the points (0, 10), (0, 11), and (-8, -6). 例如,比较点(0,10),(0,11)和(-8,-6)。 They're all 10-11 units from the origin, but the last is nowhere near the other two. 它们都是从原点出发的10-11个单位,但最后一个距离其他两个地方不远。

If you want to know whether two points are close to each other, you need to compute the distance between those two points, not the distance to an arbitrary third point. 如果要知道两个点是否彼此靠近,则需要计算这两个点之间的距离,而不是到任意第三点的距离。 The problem is quadratic in complexity, not linear. 问题的复杂度是二次方,不是线性的。

