简体   繁体   中英

Compare the Euclidean distance of two lists of tuples with fewer comparisons python

I am trying to calculate the Euclidean distance of two list of tuples with the threshold of 4 . If the threshold is less than a particular value then increment the counter . Each tuple is the x,y,z coordinate of the point . Is there anyway i can lower of the comparison of list1 with list2 ..

  X = [ (1,2,3),(2,3,4), (4,5,6) ]
  Y = [ (1,2,2) , (3,4,5),(6,7,8) ]
  from math import sqrt
  dist_X = [ sqrt((p[0] - 0)**2 + (p[1] - 0)**2 + (p[2] - 0)**2) for p in X]
  dist_Y = [ sqrt((p[0] - 0)**2 + (p[1] - 0)**2 + (p[2] - 0)**2) for p in Y]
  for x in dist_X:
     print (x ,  [ i for i,y in enumerate(dist_Y) if abs(x-y) <= 4])

I was thinking of first calculating the Euclidean distance of each point with the origin (0,0,0) so that both the lists now contain the points that are close to each other but it didnt work because its a scalar value .. Am i going in the right direction?

EDIT

   visited1 = [ (1,2,3),(2,3,4), (4,5,6) ]
   visited2   = [ (1,2,2) , (3,4,5),(6,7,8) ]
    def euclidean(a,b):
        return sqrt((a[0] - b[0])**2+(a[1]-b[1])**2+(a[2]-b[2])**2)
   comparison = 0
   for i,j in enumerate(visited2):
     for k,l in enumerate(visited1):
         if euclidean(visited2[i],visited1[k]) < 4:
                 count += 1
         comparison += 1

In this every element of the list1 is compared with every element in the list2 .. I want to know if there is a way i can minimise the comparisons given the points(x,y,z) that i have?

One precomputatiion that can sometimes speed things up are kd-trees . I ran a quick test against brute force and found that they can be quite a bit faster for larger lists:

# n = 10
# trees                 0.08512560 ms
# brute                 0.01425540 ms
# n = 100
# trees                 0.20338160 ms
# brute                 0.09876890 ms
# n = 1000
# trees                 6.40193820 ms
# brute                16.15429670 ms
# n = 10000
# trees               298.69653380 ms
# brute              1393.71134270 ms

Code:

import numpy as np
from scipy.spatial import cKDTree

import types
from timeit import timeit

def setup_data(n, k):
    data = {'d1': np.random.randint(0, 10, (n, 3)),
            'd2': np.random.randint(0, 10, (n, 3)),
            'mx': k}
    return data

def f_trees(d1, d2, mx):
    t1 = cKDTree(d1)
    t2 = cKDTree(d2)
    return t1.count_neighbors(t2, mx)

def f_brute(d1, d2, mx):
    dist2 = np.add.outer(np.einsum('ij,ij->i', d1, d1), np.einsum('ij,ij->i', d2, d2)) - 2*np.einsum('ij, kj', d1, d2)
    return np.count_nonzero(dist2 <= mx*mx)



for n in (10, 100, 1000, 10000):
    data = setup_data(n, 4)
    ref = np.array(f_trees(**data))
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.allclose(ref, func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently failed".format(name[2:]))

Draw this out in 2D to see how it looks. Points that are roughly the same distance from the origin are not necessarily close to each other -- you have the triangle inequality backwards, there. For instance, compare the points (0, 10), (0, 11), and (-8, -6). They're all 10-11 units from the origin, but the last is nowhere near the other two.

If you want to know whether two points are close to each other, you need to compute the distance between those two points, not the distance to an arbitrary third point. The problem is quadratic in complexity, not linear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM