Recently I am working on a tower-defense game in python (3.7). In the code of this game I have to check the distance between different 2-d points alot. So I wanted to check what the fastest methods for completing this task are.
I am not firm about the particular norm. The 2-norm seems like natural choice, but im not oppossed to change to the infinity-norm, if neccessary for execution time ( https://en.wikipedia.org/wiki/Lp_space ).
I was under the impression, that list-comprehension is faster than for loops and also build in functions (eg in numpy) are faster than most of code I could write. I tested some alternatives for computing the distance and was suprised that I can not confirm this impression. To my suprise the numpy-linalg.norm-function performs horrible. This was the code I used to test runtimes:
import timeit
import numpy as np
import test_extension
number_of_repeats = 10000
tower_pos = (3,5)
enemy_pos = ((1,2),(2,3),(3,4),(4,5),(5,6),(6,7),(7,8),(8,9))
distance = 3
options = ('np.linalg 2 norm', 'np.linalg inf norm', 'inf norm manuel', 'manuel squared 2 norm', 'hard coded squared 2 norm', 'c-extension hard coded squared 2 norm', 'all in extension')
def list_comprehension(option):
if option == 0:
return [pos for pos in enemy_pos if np.linalg.norm(np.subtract(tower_pos, pos)) <= distance]
elif option == 1:
return [pos for pos in enemy_pos if np.linalg.norm(np.subtract(tower_pos, pos), ord = 1) <= distance]
elif option == 2:
return [pos for pos in enemy_pos if max(abs(np.subtract(tower_pos, pos))) <= distance]
elif option == 3:
return [pos for pos in enemy_pos if sum(np.square(np.subtract(tower_pos, pos))) <= distance**2]
elif option == 4:
return [pos for pos in enemy_pos for distance_vector in [np.subtract(tower_pos, pos)] if distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1] <= distance**2 ]
elif option == 5:
return [pos for pos in enemy_pos if test_extension.distance(np.subtract(tower_pos, pos)) <= distance**2]
elif option == 6:
return test_extension.list_comprehension(tower_pos, enemy_pos, distance)
def for_loop(option):
l = []
if option == 0:
for pos in enemy_pos:
distance_vector = np.subtract(tower_pos, pos)
norm = np.linalg.norm(distance_vector)
if norm <= distance:
l.append(pos)
elif option == 1:
for pos in enemy_pos:
distance_vector = np.subtract(tower_pos, pos)
norm = np.linalg.norm(distance_vector, ord = np.inf)
if norm <= distance:
l.append(pos)
elif option == 2:
for pos in enemy_pos:
distance_vector = np.subtract(tower_pos, pos)
norm = max(abs(distance_vector))
if norm <= distance:
l.append(pos)
elif option == 3:
for pos in enemy_pos:
distance_vector = np.subtract(tower_pos, pos)
norm = sum(distance_vector * distance_vector)
if norm <= distance**2:
l.append(pos)
elif option == 4:
for pos in enemy_pos:
distance_vector = np.subtract(tower_pos, pos)
norm = distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1]
if norm <= distance**2:
l.append(pos)
elif option == 5:
d = test_extension.distance
for pos in enemy_pos:
distance_vector = np.subtract(tower_pos, pos)
norm = d(distance_vector)
if norm <= distance**2:
l.append(pos)
elif option == 6:
return test_extension.for_loop(tower_pos, enemy_pos, distance)
return l
print(f"{'_'.ljust(40)}list_comprehension for_loop")
for i,option in enumerate(options):
times = [timeit.timeit(f'{method}({i})', number = number_of_repeats, globals = {f'{method}': globals()[f'{method}']}) for method in ['list_comprehension', 'for_loop']]
s = f'{option}:'.ljust(40)
print(f"{s}{round(times[0],5)}{' '*15} {round(times[1],5)}")
which gave the following output:
list_comprehension for_loop
np.linalg 2 norm: 3.58053 3.56676
np.linalg inf norm: 3.17169 3.53372
inf norm manuel: 1.15261 1.1951
manuel squared 2 norm: 1.30239 1.28485
hard coded squared 2 norm: 1.11324 1.08586
c-extension hard coded squared 2 norm: 1.01506 1.05114
all in extension: 0.81358 0.81262
test_extension contains the below code. I then created a c-extension from test_extension using the package cython ( https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html ):
import numpy as np
def distance(vector: np.array):
return vector[0]*vector[0]+vector[1]*vector[1]
def for_loop(single_pos, pos_list, distance):
l = []
for pos in pos_list:
distance_vector = np.subtract(single_pos, pos)
norm = distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1]
if norm <= distance**2:
l.append(pos)
return l
def list_comprehension(single_pos, pos_list, distance):
return [pos for pos in pos_list for distance_vector in [np.subtract(single_pos, pos)] if distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1] <= distance**2 ]
At the moment i mainly have the 3 below questions, but feel free to give other insights you can share:
The rest of this answer will deal with the numpy behaviour observed. First of all, list comprehensions are marginally faster than loops (assuming you're building an iterable you want to keep).
Now, let's set up a function to collect time statistics. We'll also create a few arrays that we'll use frequently and we need to exclude their creation time.
def timer(func):
times = timeit.repeat(func, number=number_of_repeats, repeat=10)
return f'{np.mean(times):.5f} ({np.std(times):.5f})'
enemy_numpy = np.array(enemy_pos)
tower_numpy = np.array(tower_pos)
enemy_pos_large = enemy_pos * 20
enemy_numpy_large = np.array(enemy_pos_large)
Python level loops are slow compared to numpy operations. However, most numpy operations create a new array, whose allocation takes up a long time. On top of that the function call has its own overhead. All of this upfront cost may dominate compared to just hardcoding a couple of index getters and operations even at the slow python level.
np.subtract(enemy_pos[i] - tower_pos)
creates two 2-element arrays and then a final 2-element array for the result. The allocation time dominates over the small number of operations involved.>>> timer(lambda: np.subtract(enemy_pos[0], tower_pos))
'0.03206 (0.00031)'
>>> # initial array allocation
>>> timer(lambda: (np.array(enemy_pos[0]), np.array(tower_pos)))
'0.02101 (0.00014)'
>>> # straight up numpy subtraction of one enemy position
>>> timer(lambda: enemy_numpy[0] - tower_numpy)
'0.00636 (0.00019)'
>>> # pure python with creating tuples
>>> timer(lambda: (enemy_pos[0][0] - tower_pos[0], enemy_pos[0][1] - tower_pos[1]))
'0.00145 (0.00006)'
>>> # subtract all positions at once with `np.subtract`
>>> timer(lambda: np.subtract(enemy_pos, tower_pos))
'0.09057 (0.00029)'
>>> # same for the bigger array; 20x bigger, but only 10x slower
>>> timer(lambda: np.subtract(enemy_pos_large, tower_pos))
'1.07190 (0.00133)'
>>> # pure python beats numpy by an order of magnitude for the small list
>>> timer(lambda: [(ex - tower_pos[0], ey - tower_pos[1]) for ex, ey in enemy_pos])
'0.00918 (0.00007)'
>>> # pure python for the big list still wins, 20x bigger, 16x slower
>>> timer(lambda: [(ex - tower_pos[0], ey - tower_pos[1]) for ex, ey in enemy_pos_large])
'0.14962 (0.00055)'
>>> # preallocated small numpy arrays are barely edged out by pure python
>>> timer(lambda: enemy_numpy - tower_numpy)
'0.01140 (0.00014)'
>>> # preallocated big numpy arrays are outperforming pure python; 20x bigger, 2x slower
>>> timer(lambda: enemy_numpy_large - tower_numpy)
'0.01817 (0.00028)'
vector[0]*vector[0] + vector[1]*vector[1]
squares two integers and then adds them together. It avoids creating new arrays for the squaring and the sum and because of that the python-level slow operations edge out numpy. But things change for a bigger array.>>> v2 = np.arange(2)
>>> timer(lambda: v2[0]**2 + v2[1]**2)
'0.00741 (0.00020)'
>>> timer(lambda: np.add.reduce(v2**2))
'0.01756 (0.00027)'
>>> v6 = np.arange(6)
>>> # too many slow python operations
>>> timer(lambda: v6[0]**2 + v6[1]**2 + v6[2]**2 + v6[3]**2 + v6[4]**2 + v6[5]**2)
'0.02321 (0.00028)'
>>> # better off creating the intermediate arrays and letting numpy do its thing
>>> timer(lambda: np.add.reduce(v6**2))
'0.01744 (0.00016)'
Bottom line : Parsing a python iterable to a numpy array is costly. Numpy likes one method call acting upon the whole data at once. If you can initialise your array only once and then work with, anything other than for very small sizes the intermediate-output arrays won't matter and you'll get the advantage of fast numpy operations.
Now, let's write numpy versions of your approaches which use the array versions of your lists and see how those perform compared to python loops. Let's also compare that to pure python functions where no numpy functions are ever called. Note that some options are skipped because their code would be the same as a previous approach.
def numpified(option):
if option == 0:
# we can improve this timing if the arrays are float by default,
# as `np.linalg.norm` will convert them otherwise
return enemy_numpy[np.linalg.norm(tower_numpy - enemy_numpy, axis=1) <= distance]
elif option == 1:
return enemy_numpy[np.linalg.norm(tower_numpy - enemy_numpy, axis=1, ord=np.inf) <= distance]
elif option == 2:
return enemy_numpy[np.max(np.abs(tower_numpy - enemy_numpy), axis=1) <= distance]
elif option == 3:
return enemy_numpy[np.sum((tower_numpy - enemy_numpy)**2, axis=1) <= distance**2]
elif option == 4:
# same as option == 3
return []
elif option == 5:
temp = (tower_numpy - enemy_numpy)**2
return enemy_numpy[test_extension.distance2d(tower_numpy - enemy_numpy) <= distance**2]
elif option == 6:
# same as option == 5
return []
def pure(option):
l = []
tx, ty = tower_pos
if option == 0:
for ex, ey in enemy_pos:
x = tx - ex
y = ey - ey
# since we're dealing with real numbers, |x|**2 is just x**2.
# We can also skip taking the square root by comparing against `distance**2`
if x*x + y*y <= distance**2:
l.append((ex, ey))
elif option == 1:
for ex, ey in enemy_pos:
if max((abs(tx - ex), abs(ty - ey))) <= distance:
l.append((ex, ey))
else:
# the rest are the same to the above
return []
methods = [list_comprehension, for_loop, numpified, pure]
print('{:40s}'.format('_') + ''.join(f'{m.__name__:20s}' for m in methods))
for i, option in enumerate(options):
times = ''.join(f'{timer(lambda: m(i)):20s}' for m in methods)
print(f'{option:40s}{times}')
In the test_extension
module also add the following function
def distance2d(vector: np.array):
return vector[:,0]*vector[:,0]+vector[:,1]*vector[:,1]
Timings
_ list_comprehension for_loop numpified pure
np.linalg 2 norm 0.61583 (0.00289) 0.61282 (0.00278) 0.12935 (0.00600) 0.02203 (0.00016)
np.linalg inf norm 0.68147 (0.00486) 0.68253 (0.00515) 0.09752 (0.00338) 0.02001 (0.00009)
inf norm manuel 0.38958 (0.00235) 0.38934 (0.00275) 0.07727 (0.00096) 0.00151 (0.00002)
manuel squared 2 norm 0.43648 (0.00114) 0.42091 (0.00053) 0.08799 (0.00165) 0.00152 (0.00001)
hard coded squared 2 norm 0.34672 (0.00061) 0.34031 (0.00067) 0.00162 (0.00002) 0.00152 (0.00001)
c-extension hard coded squared 2 norm 0.34750 (0.00097) 0.34609 (0.00089) 0.09728 (0.00251) 0.00152 (0.00002)
all in extension 0.34823 (0.00082) 0.34101 (0.00087) 0.00184 (0.00001) 0.00154 (0.00001)
We can see that pure python is actually doing better. Let's try now with the bigger arrays and only comparing numpified
against pure
. Set the following before running the timing loop again.
enemy_pos = enemy_pos_large
enemy_numpy = enemy_numpy_large
methods = [numpified, pure]
Timings
_ numpified pure
np.linalg 2 norm 0.13939 (0.00460) 0.41680 (0.00130)
np.linalg inf norm 0.12642 (0.00134) 0.38518 (0.00072)
inf norm manuel 0.10712 (0.00107) 0.00155 (0.00004)
manuel squared 2 norm 0.12589 (0.00057) 0.00156 (0.00004)
hard coded squared 2 norm 0.00165 (0.00007) 0.00153 (0.00002)
c-extension hard coded squared 2 norm 0.12898 (0.00288) 0.00154 (0.00002)
all in extension 0.00199 (0.00005) 0.00166 (0.00004)
You can probably make more performance improvements by using numba, but before you look into that, consider spatial partitioning .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.