简体   繁体   English

计算列表中元组之间距离的最省时的方法是什么?

[英]What is the most time efficient way to calculate the distance between tuples in a list?

I have a list with tuples:我有一个元组列表:

tuple_list = [(1,3),(4,7),(8,1),(5,4),(9,3),(7,2),(2,7),(3,1),(8,9),(5,2)]

From this list, I want to return the minimum distance of two numbers in a tuple.从这个列表中,我想返回元组中两个数字的最小距离。

In the naive approach, I would do the following:在天真的方法中,我会做以下事情:

distance = 10
for tup in tuple_list:
    if abs(tup[0]-tup[1]) < distance:
        distance = abs(tup[0]-tup[1])

Then, in the end, distance would equal 1.然后,最后, distance将等于 1。

However, I suspect there is a faster method to obtain the minimum distance that calculates all the distances in parallel.但是,我怀疑有一种更快的方法来获得并行计算所有距离的最小距离。

To be clear, in the CPython reference interpreter, parallelized computations are pretty useless;需要明确的是,在 CPython 参考解释器中,并行计算毫无用处; the GIL prevents you from gaining meaningful benefit from CPU-bound work like this unless the work can be done by an extension that manually releases the GIL, using non-Python types. GIL会阻止您从这样的 CPU 绑定工作中获得有意义的好处,除非该工作可以通过使用非 Python 类型手动释放 GIL 的扩展来完成。 numpy could gain you some benefit (if the data was already in a numpy array) by vectorizing (likely to do better than actual parallelization anyway, unless the data is enormous ), but no matter how you slice it, the general case, for arbitrary data, will be O(n) ; numpy可以通过向量化(可能比实际并行化做得更好,除非数据量很大)为您带来一些好处(如果数据已经在numpy数组中),但无论您如何切片,一般情况下,对于任意数据,将是O(n) you can't improve on that in the general case because every item must be considered, so even in ideal circumstances, you're just applying a constant divisor to the work, but it remains O(n) .在一般情况下,您无法对此进行改进,因为必须考虑每个项目,因此即使在理想情况下,您也只是对工作应用常数除数,但它仍然是O(n)

You can simplify your code a bit, and use constructs that are better optimized in CPython, eg您可以稍微简化您的代码,并使用在 CPython 中优化得更好的结构,例如

distance = min(abs(d1 - d2) for d1, d2 in tuple_list)

which will compute abs(d1 - d2) only once per loop, and potentially save a little overhead over the plain for loop + if check (plus, it'll remove the need to come up with an initializer for distance that's definitely larger than the minimum that should replace it), but it's still O(n) , it's just simpler code with some minor micro-optimizations.这将在每个循环中只计算一次abs(d1 - d2) ,并且可能比普通for循环 + if check 节省一点开销(另外,它将消除为distance大于应该替换它的最小值),但它仍然是O(n) ,它只是带有一些小的微优化的更简单的代码。

In some special cases you could improve on this though.在某些特殊情况下,您可以对此进行改进。 If you must regularly modify the list , and must be able to quickly determine the smallest difference at any given point in time, you could use a heap with precomputed differences.如果您必须定期修改list ,并且必须能够快速确定任何给定时间点的最小差异,则可以使用具有预先计算的差异的堆 Adding a new item, or removing the minimum item, in the heap would be O(log n) (constructing the heap in the first place being O(n) ), and getting the current smallest item would be O(1) (it's always in index 0 ).在堆中添加一个新项目或删除最小项目将是O(log n) (首先构建堆是O(n) ),而获取当前最小项目将是O(1) (它是总是在索引0中)。

Constructing the heap in the first place:首先构建堆:

import heapq

tuple_list = [(1,3),(4,7),(8,1),(5,4),(9,3),(7,2),(2,7),(3,1),(8,9),(5,2)]
tuple_heap = [(abs(a - b), (a, b)) for a, b in tuple_list]  # O(n) work
heapq.heapify(tuple_heap)  # O(n) work; tuple_heap.sort() would also work,
                           # but it would be O(n log n)

Adding a new item (where x and y are the items to add):添加新项目(其中xy是要添加的项目):

heapq.heappush(tuple_heap, (abs(x - y), (x, y)))  # O(log n)

Popping off the current smallest item:弹出当前最小的项目:

diff, tup = heapq.heappop(tuple_heap)  # O(log n)
# Or to unpack values:
diff, (x, y) = heapq.heappop(tuple_heap)  # O(log n)

Getting values from current smallest item (without removing it):从当前最小的项目获取值(不删除它):

diff, tup = tuple_heap[0]  # O(1)
# Or to unpack values:
diff, (x, y) = tuple_heap[0]  # O(1)

Obviously, this only make sense if you regularly need the current minimum item, and the set of things to consider is constantly changing, but it's one of the few cases where you can get better than O(n) performance in common cases, without paying more than O(n) costs in setup costs.显然,这只有在您经常需要当前最小项时才有意义,并且要考虑的事情集不断变化,但这是在常见情况下无需付费即可获得优于O(n)性能的少数情况之一设置成本超过O(n)成本。

The only way you can optimise this would be using multi-threaded solution, and calculating the tuple-distance for each tuple in a thread, you'll see probably a time advantage for large lists, but still, in terms of complexity it will be the same O(n) .您可以优化它的唯一方法是使用多线程解决方案,并计算线程中每个元组的元组距离,您可能会看到大型列表的时间优势,但就复杂性而言,它仍然是相同的O(n) Since the solution you provided is already the most optimal, it has already a time complexity of O(n) , and there isn't a more optimal approach to find a minimum in a list than O(n) .由于您提供的解决方案已经是最佳解决方案,因此它的时间复杂度已经达到O(n) ,并且没有比O(n)更优化的方法来查找列表中的最小值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算一个单词与列表中其他单词的距离的最有效方法是什么? - What is the most efficient way to calculate the distance of a word with the other words in a list? 排序列表/数组的最有效方法(计算 3D 空间中对象之间的距离) - Most efficient way to sort list/array (calculate distance between objects in 3D space) 给定一个坐标列表,什么是计算每对点之间的距离的最有效方法? - Given a list of coordinates what is the most efficient way to compute the distance between every pair of points? 计算向量数组 A 的向量和向量数组 B 的向量之间的每个 L2 距离的最有效方法? - Most efficient way to calculate every L2 distance between vectors of vector array A and vectors of vector array B? 计算N个样本和聚类质心之间的平方欧氏距离的最有效方法是什么? - What is the most efficient way to compute the square euclidean distance between N samples and clusters centroids? 计算(x,y,z)点列表中最近邻的(欧几里得)距离的最有效方法是什么? - What is the Most Efficient Way to Compute the (euclidean) Distance of the Nearest Neighbor in a List of (x,y,z) points? 从元组列表中查找空间顺序的最有效方法? - Most efficient way to find spatial order from a list of tuples? 从元组列表创建元组的最有效方法 - Most efficient way of creating a tuple from list of tuples 在列表之间插入的最有效方法? - Most efficient way of inserting in between a list? 计算pandas帧列组合之间距离的有效方法 - efficient way to calculate distance between combinations of pandas frame columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM