简体   繁体   中英

How do I find the closest n numbers to a given number at x distance from it?

For eg I have a list of number like

lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]

And I need 2 numbers at 3 distance from a given number let's say 5, so the output list should look like this

output_lst = [2, 8]

Here by distance I mean the distance on number-line and not in list index. So 3 numbers, 2 distance from 5 would give

output_lst = [3,3,7]

What I tired doing was to use nsmallest from heapq like this

check_number = 5

output_lst = nsmallest(3, lst, key=lambda x: abs(x - check_number))

But the problem here is that I don't know specify the distance. It will just outputs 3 closest numbers to 5.

[4,4,5]

You can use a list comprehension for that. See this post for more on list comprehensions.

>>> lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> given_numer = 5
>>> distance = 3
>>> [i for i in lst if abs(i-given_numer)==distance]
[2, 8]

The logic is quite simple, we just see if the absolute value of the difference between each number and the given number, if so we return the value. Similarly

>>> distance = 2
>>> [i for i in lst if abs(i-given_numer)==distance]
[3, 3, 7]

Let's complicate a bit and try to use filter and closures. The code is:

Just to show that it is an alternative .

def checkdistance(given_number,distance):
    def innerfunc(value):
        return abs(value-given_number)==distance
    return innerfunc


lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
given_number = 5
distance = 3
checkdistance3from5 = checkdistance(5,3)
list(filter(checkdistance3from5,lst))

numpy approach:

import numpy as np

check_number = 5
distance = 3
a = np.array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
a[np.absolute(a - check_number) == distance]

Check:

In [46]: a
Out[46]: array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])

In [47]: a[np.absolute(a-5) == 3]
Out[47]: array([2, 8])

Timings (in ms 1/1000 seconds ) for differently sized arrays/lists:

In [141]: df
Out[141]:
           numpy  list_comprehension
size
10        0.0242              0.0148
20        0.0248              0.0179
30        0.0254              0.0219
50        0.0267              0.0288
100       0.0292              0.0457
1000      0.0712              0.3210
10000     0.4290              3.3700
100000    3.8900             33.6000
1000000  46.4000            343.0000

plot:

在此处输入图片说明

bar plot for arrays with size <= 1000 ( df[df.index<=1000].plot.bar() ):

在此处输入图片说明

Code:

def np_approach(n, check_number=5, distance=3):
    a = np.random.randint(0,100, n)
    return a[np.absolute(a - check_number) == distance]

def list_comprehension(n, check_number=5, distance=3):
    lst = np.random.randint(0,100, n).tolist()
    return [i for i in lst if abs(i-check_number)==distance]

In [102]: %timeit list_comprehension(10**2)
10000 loops, best of 3: 45.7 ┬╡s per loop

In [103]: %timeit np_approach(10**2)
10000 loops, best of 3: 29.2 ┬╡s per loop

In [104]: %timeit list_comprehension(10**3)
1000 loops, best of 3: 321 ┬╡s per loop

In [105]: %timeit np_approach(10**3)
The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 71.2 ┬╡s per loop

In [106]: %timeit list_comprehension(10**4)
100 loops, best of 3: 3.37 ms per loop

In [107]: %timeit np_approach(10**4)
1000 loops, best of 3: 429 ┬╡s per loop

In [108]: %timeit list_comprehension(10**5)
10 loops, best of 3: 33.6 ms per loop

In [109]: %timeit np_approach(10**5)
100 loops, best of 3: 3.89 ms per loop

In [110]: %timeit list_comprehension(10**6)
1 loop, best of 3: 343 ms per loop

In [111]: %timeit np_approach(10**6)
10 loops, best of 3: 46.4 ms per loop

In [112]: %timeit list_comprehension(50)
10000 loops, best of 3: 28.8 ┬╡s per loop

In [113]: %timeit np_approach(50)
10000 loops, best of 3: 26.7 ┬╡s per loop

In [118]: %timeit list_comprehension(40)
The slowest run took 6.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 ┬╡s per loop

In [119]: %timeit np_approach(40)
The slowest run took 8.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.2 ┬╡s per loop

In [120]: %timeit list_comprehension(30)
10000 loops, best of 3: 21.9 ┬╡s per loop

In [121]: %timeit np_approach(30)
10000 loops, best of 3: 25.4 ┬╡s per loop

In [122]: %timeit list_comprehension(20)
100000 loops, best of 3: 17.9 ┬╡s per loop

In [123]: %timeit np_approach(20)
10000 loops, best of 3: 24.8 ┬╡s per loop

In [124]: %timeit list_comprehension(10)
100000 loops, best of 3: 14.8 ┬╡s per loop

In [125]: %timeit np_approach(10)
10000 loops, best of 3: 24.2 ┬╡s per loop

Conclusion: numpy approach is faster compared to list comprehension approach for larger lists, for very small lists (less than 50 elements) it might be other way around

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM