How do I find the closest n numbers to a given number at x distance from it?

Question

For eg I have a list of number like

lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]

And I need 2 numbers at 3 distance from a given number let's say 5, so the output list should look like this

output_lst = [2, 8]

Here by distance I mean the distance on number-line and not in list index. So 3 numbers, 2 distance from 5 would give

output_lst = [3,3,7]

What I tired doing was to use nsmallest from heapq like this

check_number = 5

output_lst = nsmallest(3, lst, key=lambda x: abs(x - check_number))

But the problem here is that I don't know specify the distance. It will just outputs 3 closest numbers to 5.

[4,4,5]

Answer 1

You can use a list comprehension for that. See this post for more on list comprehensions.

>>> lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> given_numer = 5
>>> distance = 3
>>> [i for i in lst if abs(i-given_numer)==distance]
[2, 8]

The logic is quite simple, we just see if the absolute value of the difference between each number and the given number, if so we return the value. Similarly

>>> distance = 2
>>> [i for i in lst if abs(i-given_numer)==distance]
[3, 3, 7]

Let's complicate a bit and try to use filter and closures. The code is:

_{Just to show that it is an alternative} .

def checkdistance(given_number,distance):
    def innerfunc(value):
        return abs(value-given_number)==distance
    return innerfunc


lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
given_number = 5
distance = 3
checkdistance3from5 = checkdistance(5,3)
list(filter(checkdistance3from5,lst))

Answer 2

numpy approach:

import numpy as np

check_number = 5
distance = 3
a = np.array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
a[np.absolute(a - check_number) == distance]

Check:

In [46]: a
Out[46]: array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])

In [47]: a[np.absolute(a-5) == 3]
Out[47]: array([2, 8])

Timings (in ms 1/1000 seconds ) for differently sized arrays/lists:

In [141]: df
Out[141]:
           numpy  list_comprehension
size
10        0.0242              0.0148
20        0.0248              0.0179
30        0.0254              0.0219
50        0.0267              0.0288
100       0.0292              0.0457
1000      0.0712              0.3210
10000     0.4290              3.3700
100000    3.8900             33.6000
1000000  46.4000            343.0000

plot:

bar plot for arrays with size <= 1000 ( df[df.index<=1000].plot.bar() ):

Code:

def np_approach(n, check_number=5, distance=3):
    a = np.random.randint(0,100, n)
    return a[np.absolute(a - check_number) == distance]

def list_comprehension(n, check_number=5, distance=3):
    lst = np.random.randint(0,100, n).tolist()
    return [i for i in lst if abs(i-check_number)==distance]

In [102]: %timeit list_comprehension(10**2)
10000 loops, best of 3: 45.7 ┬╡s per loop

In [103]: %timeit np_approach(10**2)
10000 loops, best of 3: 29.2 ┬╡s per loop

In [104]: %timeit list_comprehension(10**3)
1000 loops, best of 3: 321 ┬╡s per loop

In [105]: %timeit np_approach(10**3)
The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 71.2 ┬╡s per loop

In [106]: %timeit list_comprehension(10**4)
100 loops, best of 3: 3.37 ms per loop

In [107]: %timeit np_approach(10**4)
1000 loops, best of 3: 429 ┬╡s per loop

In [108]: %timeit list_comprehension(10**5)
10 loops, best of 3: 33.6 ms per loop

In [109]: %timeit np_approach(10**5)
100 loops, best of 3: 3.89 ms per loop

In [110]: %timeit list_comprehension(10**6)
1 loop, best of 3: 343 ms per loop

In [111]: %timeit np_approach(10**6)
10 loops, best of 3: 46.4 ms per loop

In [112]: %timeit list_comprehension(50)
10000 loops, best of 3: 28.8 ┬╡s per loop

In [113]: %timeit np_approach(50)
10000 loops, best of 3: 26.7 ┬╡s per loop

In [118]: %timeit list_comprehension(40)
The slowest run took 6.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 ┬╡s per loop

In [119]: %timeit np_approach(40)
The slowest run took 8.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.2 ┬╡s per loop

In [120]: %timeit list_comprehension(30)
10000 loops, best of 3: 21.9 ┬╡s per loop

In [121]: %timeit np_approach(30)
10000 loops, best of 3: 25.4 ┬╡s per loop

In [122]: %timeit list_comprehension(20)
100000 loops, best of 3: 17.9 ┬╡s per loop

In [123]: %timeit np_approach(20)
10000 loops, best of 3: 24.8 ┬╡s per loop

In [124]: %timeit list_comprehension(10)
100000 loops, best of 3: 14.8 ┬╡s per loop

In [125]: %timeit np_approach(10)
10000 loops, best of 3: 24.2 ┬╡s per loop

Conclusion: numpy approach is faster compared to list comprehension approach for larger lists, for very small lists (less than 50 elements) it might be other way around

How do I find the closest n numbers to a given number at x distance from it?

Question

2 answers

solution1
3 ACCPTED 2016-06-05 09:22:50

solution2
3 2016-06-05 09:24:44

How do I find the closest n numbers to a given number at x distance from it?

Question

2 answers

solution1 3 ACCPTED 2016-06-05 09:22:50

solution2 3 2016-06-05 09:24:44

solution1
3 ACCPTED 2016-06-05 09:22:50

solution2
3 2016-06-05 09:24:44