[英]How to ensure minimum euclidean distance in a list of tuples
I have an extremely large list of coordinates in the form of a list of tuples.我有一个非常大的坐标列表,以元组列表的形式出现。
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
The list of tuple is actually being formed by a for loop with an append command like so:元组列表实际上是由带有 append 命令的 for 循环形成的,如下所示:
data = []
for i in source: # where i a tuple of form (x,y)
data.append(i)
Is there an approach to ensure euclidean distance between all tuples is above a certain threshold?有没有一种方法可以确保所有元组之间的欧几里得距离高于某个阈值? In this example there is a very small distance between (1,1),(1,2),(2,1).
在这个例子中,(1,1),(1,2),(2,1) 之间的距离非常小。 In this scenario I would like to keep only one of the 3 tuples.
在这种情况下,我只想保留 3 个元组中的一个。 Resulting in either one of these new list of tuples:
产生这些新的元组列表之一:
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(2,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(1,2),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
I have a brute force algorithm that iterates through the list but there should be a more elegant way or quicker way to do this?我有一个遍历列表的蛮力算法,但应该有更优雅或更快捷的方法来做到这一点? Or is there any other methods to speed up this operation?
或者有没有其他方法可以加快这个操作? I am expecting lists of ~70k up to 500k tuples.
我期待大约 70k 到 500k 元组的列表。
My method:我的方法:
from scipy.spatial.distance import euclidean
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
new_data = []
while len(data) >0:
check = data.pop()
flag = True
for i in data:
if euclidean(check,i) < 5:
flag = False
break
else:
pass
if flag == True:
new_data.append(check)
else:
flag = True
Additional points: Although the list of tuples is coming from some iterative function, the order of tuples is uncertain.附加点:虽然元组列表来自一些迭代的 function,但元组的顺序是不确定的。 Actual number of tuples is unknown until end of for loop.
在 for 循环结束之前,元组的实际数量是未知的。 I would rather avoid multiprocessing/multithreading for speed up in this scenario.
在这种情况下,我宁愿避免多处理/多线程以加快速度。 If necessary I can put up some timings but I dont think its necessary.
如有必要,我可以提出一些时间安排,但我认为没有必要。 The solution I have right now is time O(n(n-1)/2) and space complexity of O(n) I think so any improvement would be better.
我现在的解决方案是时间 O(n(n-1)/2) 和 O(n) 的空间复杂度,我认为任何改进都会更好。
You can organize your 2D data/tuples using a Quadtree .您可以使用Quadtree组织您的 2D 数据/元组。
Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions.
四叉树是八叉树的二维模拟,最常用于通过递归地将二维空间细分为四个象限或区域来划分二维空间。
you can use numpy try this:你可以使用 numpy 试试这个:
import numpy as np
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
start_time = time.time()
#transform to numpy array
a = np.array(data)
subs = a[:,None] - a
#calculate ecludien distance between all element
dist=np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))
#replace 0 to 5 because distance distance between identic element will be 0
dist=np.where(dist == 0, 5, dist)
#select element where distance sup to 5
dist_bool=[dist[:,0] < 5]
#select element where distance sup to 5 are false
a=a[dist_bool[0] == False]
print("--- %s seconds ---" % (time.time() - start_time))#got --- 0.00020575523376464844 seconds ---
when we compare to your soltion:当我们与您的解决方案进行比较时:
start_time = time.time()
new_data = []
while len(data) >0:
check = data.pop()
flag = True
for i in data:
if euclidean(check,i) < 5:
flag = False
break
else:
pass
if flag == True:
new_data.append(check)
else:
flag = True
print("--- %s seconds ---" % (time.time() - start_time))# got ---0.001013040542602539 seconds ---
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.