简体   繁体   中英

Fastest approach for performing operations on all possible combinations

I'm looking for fastest approach to get min absoulute difference between all possible pair combination from a list.

I did two solutions but none is acceptable for time duration.

arr = [x for x in range(10000)]
minAbsDiff1(arr)
minAbsDiff2(arr)

def absDiff(elem):
    return abs(elem[0]-elem[1])

# first solution takes 5.96 sec
def minAbsDiff1(arr):
    seq = itertools.combinations(arr, 2)
    m = min(seq, key=absDiff)
return absDiff(m)

# second solution takes 6.96 sec
def minAbsDiff2(arr):
    seq = itertools.combinations(arr, 2)
    test = [abs(tup[0]-tup[1]) for tup in seq]
return min(test)

Input example: [3, -7, 0]

All combinations: (3, -7), (3, 0), (-7, 0)

Output min abs diff: 3

Explanation: 3 - 0 = 3

Solutions

Another possible way that might give you a faster result:

Sorting the values first and iterating over them to find the min difference:

def minAbsDiffSorted(arr):
    sorted_arr = sorted(arr)
    min_val = sorted_arr[-1] - sorted_arr[0]
    for i, j in zip(sorted_arr[:-1], sorted_arr[1:]):
        min_val = min(min_val, j - i)
    return min_val

Doing the same with numpy is even faster:

import numpy as np
def minAbsDiffNumpy(arr):
    return np.diff(np.sort(np.array(arr))).min()

Mechanism

Array to process:

import numpy as np
import random
arr = np.array([random.randint(0, 100) for _ in range(20)])
>>>
array([55, 76, 88,  2, 68,  9, 24, 50, 15, 86, 19, 31, 80, 39, 14, 48, 32,
       32, 35, 26])

Let's sort the array:

arr = np.sort(arr)
>>>
array([ 2,  9, 14, 15, 19, 24, 26, 31, 32, 32, 35, 39, 48, 50, 55, 68, 76,
       80, 86, 88])

Get the differences between the values:

np.diff(arr)
>>>
array([ 7,  5,  1,  4,  5,  2,  5,  1,  0,  3,  4,  9,  2,  5, 13,  8,  4,
        6,  2])

You take the minimum of these differences, which, in this case, is 0. This is equivalent to the minimal distance of the pair-wise combinations of the original array.

Times

Here are the respective times on my machine:

%%timeit
minAbsDiff1(arr)
17.3 s ± 438 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
minAbsDiff2(arr)
19.1 s ± 1.16 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
minAbsDiffSorted(arr)
7.85 ms ± 498 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
minAbsDiffNumpy(arr)
444 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Explanation

For the reasons behind it, see @Yves Daoust detailed explanations.

Yes, using combinations might also sort the results. However, there the dominant operation is not sorting but making the combinations themselves.

Here you can read more about itertools.combinations time complexity.

Compared to that, here the most expensive operation is the sorting and that's it.

If you sort the elements increasingly, the closest to every element is either the immediate previous or the immediate next. Hence it suffices to try every consecutive pairs.

Doing so, you trade O(n²) complexity for O(n), a significant improvement. Sorting will take O(n log n) and dominate the cost (still better than O(n²)), unless your data allows non-comparison-based sorting.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM