The fastest way to find all pairs of numbers in a list that differ not more than x

Question

What I have now:

d = 0
res = 0
newlist = []
l = [4, 1, 6, 1, 1, 1]

for el in range(len(l)):
    for j in range(len(l)):
        if abs(l[el] - l[j]) <= d and el != j and el not in newlist and j not in newlist:
            newlist.append(el)
            newlist.append(j)
            res += 1

print(res)

It works well and returns 2 which is correct(1,1; 1,1) but takes too much time. How can I make it work faster ? Thanks.

For example if list = [1, 1, 1, 1] and d = 0 there will be 2 pairs because you can use each number only once. Using (a, b) and (b, c) is not allowed and (a, b) with (b, a) is the same pair...

Answer 1

Sort the list, then walk through it.

Once you have the list sorted, you can just be greedy: take the earliest pair that works, then the next, then the next... and you will end up with the maximum number of valid pairs.

def get_pairs(lst, maxdiff):
    sl = sorted(lst) # may want to do lst.sort() if you don't mind changing lst
    count = 0
    i = 1
    N = len(sl)
    while i < N:
        # no need for abs -- we know the previous value is not bigger.
        if sl[i] - sl[i-1] <= maxdiff:
            count += 1
            i += 2 # these two values are now used
        else:
            i += 1
    return count

And here's some code to benchmark it:

print('generating list...')
from random import randrange, seed
seed(0) # always same contents
l = []
for i in range(1000000):
    l.append(randrange(0,5000))

print('ok, measuring...')

from time import time

start = time();
print(get_pairs(l, 0))
print('took', time()-start, 'seconds')

And the result (with 1 million values in list):

tmp$ ./test.py 
generating list...
ok, measuring...
498784
took 0.6729779243469238 seconds

Answer 2

You may want to compute all the pairs separately and then collect the pairs you want.

def get_pairs(l, difference):
    pairs = []
    # first compute all pairs: n choose 2 which is O(n^2)
    for i in xrange(len(l)):
        for j in xrange(i+1, len(l)):
            pairs.append((l[i], l[j]))

    # collect pairs you want: O(n^2)
    res = []
    for pair in pairs:
        if abs(pair[0] - pair[1]) <= difference:
            res.append(pair)
    return res

>>> get_pairs([1,2,3,4,2], 0)
>>> [(2, 2)]
>>> get_pairs([1,2,3,4,2], 1)
>>> [(1, 2), (1, 2), (2, 3), (2, 2), (3, 4), (3, 2)]

If you want to remove duplicates from you result, you can convert the res list to a set before you return it with set(res) .

The fastest way to find all pairs of numbers in a list that differ not more than x

Question

2 answers

solution1
3 ACCPTED 2015-11-07 19:34:47

solution2
1 2015-11-07 18:25:09

The fastest way to find all pairs of numbers in a list that differ not more than x

Question

2 answers

solution1 3 ACCPTED 2015-11-07 19:34:47

solution2 1 2015-11-07 18:25:09

solution1
3 ACCPTED 2015-11-07 19:34:47

solution2
1 2015-11-07 18:25:09