For eg I have a list of number like
lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
And I need 2 numbers at 3 distance from a given number let's say 5, so the output list should look like this
output_lst = [2, 8]
Here by distance I mean the distance on number-line and not in list index. So 3 numbers, 2 distance from 5 would give
output_lst = [3,3,7]
What I tired doing was to use nsmallest from heapq like this
check_number = 5
output_lst = nsmallest(3, lst, key=lambda x: abs(x - check_number))
But the problem here is that I don't know specify the distance. It will just outputs 3 closest numbers to 5.
[4,4,5]
You can use a list comprehension for that. See this post for more on list comprehensions.
>>> lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> given_numer = 5
>>> distance = 3
>>> [i for i in lst if abs(i-given_numer)==distance]
[2, 8]
The logic is quite simple, we just see if the absolute value of the difference between each number and the given number, if so we return the value. Similarly
>>> distance = 2
>>> [i for i in lst if abs(i-given_numer)==distance]
[3, 3, 7]
Let's complicate a bit and try to use filter
and closures. The code is:
Just to show that it is an alternative .
def checkdistance(given_number,distance):
def innerfunc(value):
return abs(value-given_number)==distance
return innerfunc
lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
given_number = 5
distance = 3
checkdistance3from5 = checkdistance(5,3)
list(filter(checkdistance3from5,lst))
numpy approach:
import numpy as np
check_number = 5
distance = 3
a = np.array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
a[np.absolute(a - check_number) == distance]
Check:
In [46]: a
Out[46]: array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
In [47]: a[np.absolute(a-5) == 3]
Out[47]: array([2, 8])
Timings (in ms
1/1000 seconds ) for differently sized arrays/lists:
In [141]: df
Out[141]:
numpy list_comprehension
size
10 0.0242 0.0148
20 0.0248 0.0179
30 0.0254 0.0219
50 0.0267 0.0288
100 0.0292 0.0457
1000 0.0712 0.3210
10000 0.4290 3.3700
100000 3.8900 33.6000
1000000 46.4000 343.0000
plot:
bar plot for arrays with size <= 1000 ( df[df.index<=1000].plot.bar()
):
Code:
def np_approach(n, check_number=5, distance=3):
a = np.random.randint(0,100, n)
return a[np.absolute(a - check_number) == distance]
def list_comprehension(n, check_number=5, distance=3):
lst = np.random.randint(0,100, n).tolist()
return [i for i in lst if abs(i-check_number)==distance]
In [102]: %timeit list_comprehension(10**2)
10000 loops, best of 3: 45.7 ┬╡s per loop
In [103]: %timeit np_approach(10**2)
10000 loops, best of 3: 29.2 ┬╡s per loop
In [104]: %timeit list_comprehension(10**3)
1000 loops, best of 3: 321 ┬╡s per loop
In [105]: %timeit np_approach(10**3)
The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 71.2 ┬╡s per loop
In [106]: %timeit list_comprehension(10**4)
100 loops, best of 3: 3.37 ms per loop
In [107]: %timeit np_approach(10**4)
1000 loops, best of 3: 429 ┬╡s per loop
In [108]: %timeit list_comprehension(10**5)
10 loops, best of 3: 33.6 ms per loop
In [109]: %timeit np_approach(10**5)
100 loops, best of 3: 3.89 ms per loop
In [110]: %timeit list_comprehension(10**6)
1 loop, best of 3: 343 ms per loop
In [111]: %timeit np_approach(10**6)
10 loops, best of 3: 46.4 ms per loop
In [112]: %timeit list_comprehension(50)
10000 loops, best of 3: 28.8 ┬╡s per loop
In [113]: %timeit np_approach(50)
10000 loops, best of 3: 26.7 ┬╡s per loop
In [118]: %timeit list_comprehension(40)
The slowest run took 6.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 ┬╡s per loop
In [119]: %timeit np_approach(40)
The slowest run took 8.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.2 ┬╡s per loop
In [120]: %timeit list_comprehension(30)
10000 loops, best of 3: 21.9 ┬╡s per loop
In [121]: %timeit np_approach(30)
10000 loops, best of 3: 25.4 ┬╡s per loop
In [122]: %timeit list_comprehension(20)
100000 loops, best of 3: 17.9 ┬╡s per loop
In [123]: %timeit np_approach(20)
10000 loops, best of 3: 24.8 ┬╡s per loop
In [124]: %timeit list_comprehension(10)
100000 loops, best of 3: 14.8 ┬╡s per loop
In [125]: %timeit np_approach(10)
10000 loops, best of 3: 24.2 ┬╡s per loop
Conclusion: numpy approach is faster compared to list comprehension approach for larger lists, for very small lists (less than 50 elements) it might be other way around
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.