简体   繁体   中英

How to ensure minimum euclidean distance in a list of tuples

I have an extremely large list of coordinates in the form of a list of tuples.

data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]

The list of tuple is actually being formed by a for loop with an append command like so:

data = []
    for i in source: # where i a tuple of form (x,y)
        data.append(i)

Is there an approach to ensure euclidean distance between all tuples is above a certain threshold? In this example there is a very small distance between (1,1),(1,2),(2,1). In this scenario I would like to keep only one of the 3 tuples. Resulting in either one of these new list of tuples:

data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(2,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(1,2),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]

I have a brute force algorithm that iterates through the list but there should be a more elegant way or quicker way to do this? Or is there any other methods to speed up this operation? I am expecting lists of ~70k up to 500k tuples.

My method:

from scipy.spatial.distance import euclidean
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
new_data = []
while len(data) >0:
    
    check = data.pop()
    flag = True
    for i in data:
         if euclidean(check,i) < 5:
              flag = False
              break
         else:
              pass
    if flag == True:
        new_data.append(check)
    else:
        flag = True
    
         

Additional points: Although the list of tuples is coming from some iterative function, the order of tuples is uncertain. Actual number of tuples is unknown until end of for loop. I would rather avoid multiprocessing/multithreading for speed up in this scenario. If necessary I can put up some timings but I dont think its necessary. The solution I have right now is time O(n(n-1)/2) and space complexity of O(n) I think so any improvement would be better.

You can organize your 2D data/tuples using a Quadtree .

Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions.

you can use numpy try this:

import numpy as np

data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
start_time = time.time()
#transform to numpy array
a = np.array(data)
subs = a[:,None] - a
#calculate ecludien distance between all element
dist=np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))
#replace 0 to 5 because distance distance between identic element will be 0
dist=np.where(dist == 0, 5, dist)
#select element where distance sup to 5
dist_bool=[dist[:,0] < 5]
#select element where distance sup to 5 are false 
a=a[dist_bool[0] == False]
print("--- %s seconds ---" % (time.time() - start_time))#got --- 0.00020575523376464844 seconds ---

when we compare to your soltion:

start_time = time.time()
new_data = []
while len(data) >0:
    check = data.pop()
    flag = True
    for i in data:
         if euclidean(check,i) < 5:
              flag = False
              break
         else:
              pass
    if flag == True:
        new_data.append(check)
    else:
        flag = True
print("--- %s seconds ---" % (time.time() - start_time))# got ---0.001013040542602539 seconds ---

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM