简体   繁体   English

对所有可能组合执行操作的最快方法

[英]Fastest approach for performing operations on all possible combinations

I'm looking for fastest approach to get min absoulute difference between all possible pair combination from a list.我正在寻找最快的方法来获得列表中所有可能的配对组合之间的最小绝对差异。

I did two solutions but none is acceptable for time duration.我做了两个解决方案,但在持续时间内没有一个是可以接受的。

arr = [x for x in range(10000)]
minAbsDiff1(arr)
minAbsDiff2(arr)

def absDiff(elem):
    return abs(elem[0]-elem[1])

# first solution takes 5.96 sec
def minAbsDiff1(arr):
    seq = itertools.combinations(arr, 2)
    m = min(seq, key=absDiff)
return absDiff(m)

# second solution takes 6.96 sec
def minAbsDiff2(arr):
    seq = itertools.combinations(arr, 2)
    test = [abs(tup[0]-tup[1]) for tup in seq]
return min(test)

Input example: [3, -7, 0]输入示例:[3, -7, 0]

All combinations: (3, -7), (3, 0), (-7, 0)所有组合:(3, -7), (3, 0), (-7, 0)

Output min abs diff: 3 Output 最小绝对差异:3

Explanation: 3 - 0 = 3解释:3 - 0 = 3

Solutions解决方案

Another possible way that might give you a faster result:另一种可能为您提供更快结果的方法:

Sorting the values first and iterating over them to find the min difference:首先对值进行排序并对其进行迭代以找到最小差异:

def minAbsDiffSorted(arr):
    sorted_arr = sorted(arr)
    min_val = sorted_arr[-1] - sorted_arr[0]
    for i, j in zip(sorted_arr[:-1], sorted_arr[1:]):
        min_val = min(min_val, j - i)
    return min_val

Doing the same with numpy is even faster:对 numpy 执行相同操作甚至更快:

import numpy as np
def minAbsDiffNumpy(arr):
    return np.diff(np.sort(np.array(arr))).min()

Mechanism机制

Array to process:要处理的数组:

import numpy as np
import random
arr = np.array([random.randint(0, 100) for _ in range(20)])
>>>
array([55, 76, 88,  2, 68,  9, 24, 50, 15, 86, 19, 31, 80, 39, 14, 48, 32,
       32, 35, 26])

Let's sort the array:让我们对数组进行排序:

arr = np.sort(arr)
>>>
array([ 2,  9, 14, 15, 19, 24, 26, 31, 32, 32, 35, 39, 48, 50, 55, 68, 76,
       80, 86, 88])

Get the differences between the values:获取值之间的差异:

np.diff(arr)
>>>
array([ 7,  5,  1,  4,  5,  2,  5,  1,  0,  3,  4,  9,  2,  5, 13,  8,  4,
        6,  2])

You take the minimum of these differences, which, in this case, is 0. This is equivalent to the minimal distance of the pair-wise combinations of the original array.您取这些差异中的最小值,在本例中为 0。这相当于原始数组的成对组合的最小距离。

Times时代

Here are the respective times on my machine:以下是我机器上的相应时间:

%%timeit
minAbsDiff1(arr)
17.3 s ± 438 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
minAbsDiff2(arr)
19.1 s ± 1.16 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
minAbsDiffSorted(arr)
7.85 ms ± 498 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
minAbsDiffNumpy(arr)
444 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Explanation解释

For the reasons behind it, see @Yves Daoust detailed explanations.关于其背后的原因,请参阅@Yves Daoust 详细解释。

Yes, using combinations might also sort the results.是的,使用组合也可以对结果进行排序。 However, there the dominant operation is not sorting but making the combinations themselves.但是,主要的操作不是排序,而是自己进行组合。

Here you can read more about itertools.combinations time complexity.在这里您可以阅读有关itertools.combinations时间复杂度的更多信息。

Compared to that, here the most expensive operation is the sorting and that's it.与此相比,这里最昂贵的操作是排序,仅此而已。

If you sort the elements increasingly, the closest to every element is either the immediate previous or the immediate next.如果您对元素进行越来越多的排序,则最接近每个元素的是前一个或下一个。 Hence it suffices to try every consecutive pairs.因此,尝试每一对连续的对就足够了。

Doing so, you trade O(n²) complexity for O(n), a significant improvement.这样做,您可以用 O(n²) 复杂度换取 O(n),这是一个显着的改进。 Sorting will take O(n log n) and dominate the cost (still better than O(n²)), unless your data allows non-comparison-based sorting.除非您的数据允许基于非比较的排序,否则排序将花费 O(n log n) 并主导成本(仍然优于 O(n²))。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM