[英]What is the most time efficient way to calculate the distance between tuples in a list?
I have a list with tuples:我有一个元组列表:
tuple_list = [(1,3),(4,7),(8,1),(5,4),(9,3),(7,2),(2,7),(3,1),(8,9),(5,2)]
From this list, I want to return the minimum distance of two numbers in a tuple.从这个列表中,我想返回元组中两个数字的最小距离。
In the naive approach, I would do the following:在天真的方法中,我会做以下事情:
distance = 10
for tup in tuple_list:
if abs(tup[0]-tup[1]) < distance:
distance = abs(tup[0]-tup[1])
Then, in the end, distance
would equal 1.然后,最后,
distance
将等于 1。
However, I suspect there is a faster method to obtain the minimum distance that calculates all the distances in parallel.但是,我怀疑有一种更快的方法来获得并行计算所有距离的最小距离。
To be clear, in the CPython reference interpreter, parallelized computations are pretty useless;需要明确的是,在 CPython 参考解释器中,并行计算毫无用处; the GIL prevents you from gaining meaningful benefit from CPU-bound work like this unless the work can be done by an extension that manually releases the GIL, using non-Python types.
GIL会阻止您从这样的 CPU 绑定工作中获得有意义的好处,除非该工作可以通过使用非 Python 类型手动释放 GIL 的扩展来完成。
numpy
could gain you some benefit (if the data was already in a numpy
array) by vectorizing (likely to do better than actual parallelization anyway, unless the data is enormous ), but no matter how you slice it, the general case, for arbitrary data, will be O(n)
; numpy
可以通过向量化(可能比实际并行化做得更好,除非数据量很大)为您带来一些好处(如果数据已经在numpy
数组中),但无论您如何切片,一般情况下,对于任意数据,将是O(n)
; you can't improve on that in the general case because every item must be considered, so even in ideal circumstances, you're just applying a constant divisor to the work, but it remains O(n)
.在一般情况下,您无法对此进行改进,因为必须考虑每个项目,因此即使在理想情况下,您也只是对工作应用常数除数,但它仍然是
O(n)
。
You can simplify your code a bit, and use constructs that are better optimized in CPython, eg您可以稍微简化您的代码,并使用在 CPython 中优化得更好的结构,例如
distance = min(abs(d1 - d2) for d1, d2 in tuple_list)
which will compute abs(d1 - d2)
only once per loop, and potentially save a little overhead over the plain for
loop + if
check (plus, it'll remove the need to come up with an initializer for distance
that's definitely larger than the minimum that should replace it), but it's still O(n)
, it's just simpler code with some minor micro-optimizations.这将在每个循环中只计算一次
abs(d1 - d2)
,并且可能比普通for
循环 + if
check 节省一点开销(另外,它将消除为distance
大于应该替换它的最小值),但它仍然是O(n)
,它只是带有一些小的微优化的更简单的代码。
In some special cases you could improve on this though.在某些特殊情况下,您可以对此进行改进。 If you must regularly modify the
list
, and must be able to quickly determine the smallest difference at any given point in time, you could use a heap with precomputed differences.如果您必须定期修改
list
,并且必须能够快速确定任何给定时间点的最小差异,则可以使用具有预先计算的差异的堆。 Adding a new item, or removing the minimum item, in the heap would be O(log n)
(constructing the heap in the first place being O(n)
), and getting the current smallest item would be O(1)
(it's always in index 0
).在堆中添加一个新项目或删除最小项目将是
O(log n)
(首先构建堆是O(n)
),而获取当前最小项目将是O(1)
(它是总是在索引0
中)。
Constructing the heap in the first place:首先构建堆:
import heapq
tuple_list = [(1,3),(4,7),(8,1),(5,4),(9,3),(7,2),(2,7),(3,1),(8,9),(5,2)]
tuple_heap = [(abs(a - b), (a, b)) for a, b in tuple_list] # O(n) work
heapq.heapify(tuple_heap) # O(n) work; tuple_heap.sort() would also work,
# but it would be O(n log n)
Adding a new item (where x
and y
are the items to add):添加新项目(其中
x
和y
是要添加的项目):
heapq.heappush(tuple_heap, (abs(x - y), (x, y))) # O(log n)
Popping off the current smallest item:弹出当前最小的项目:
diff, tup = heapq.heappop(tuple_heap) # O(log n)
# Or to unpack values:
diff, (x, y) = heapq.heappop(tuple_heap) # O(log n)
Getting values from current smallest item (without removing it):从当前最小的项目获取值(不删除它):
diff, tup = tuple_heap[0] # O(1)
# Or to unpack values:
diff, (x, y) = tuple_heap[0] # O(1)
Obviously, this only make sense if you regularly need the current minimum item, and the set of things to consider is constantly changing, but it's one of the few cases where you can get better than O(n)
performance in common cases, without paying more than O(n)
costs in setup costs.显然,这只有在您经常需要当前最小项时才有意义,并且要考虑的事情集不断变化,但这是在常见情况下无需付费即可获得优于
O(n)
性能的少数情况之一设置成本超过O(n)
成本。
The only way you can optimise this would be using multi-threaded solution, and calculating the tuple-distance for each tuple in a thread, you'll see probably a time advantage for large lists, but still, in terms of complexity it will be the same O(n)
.您可以优化它的唯一方法是使用多线程解决方案,并计算线程中每个元组的元组距离,您可能会看到大型列表的时间优势,但就复杂性而言,它仍然是相同的
O(n)
。 Since the solution you provided is already the most optimal, it has already a time complexity of O(n)
, and there isn't a more optimal approach to find a minimum in a list than O(n)
.由于您提供的解决方案已经是最佳解决方案,因此它的时间复杂度已经达到
O(n)
,并且没有比O(n)
更优化的方法来查找列表中的最小值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.