How to speed up for loop?

Question

I am trying to speed up this for loop in python, where N is a large number around 12000. This loop takes over 20 seconds to run. The lines calculating cossum and sinsum seem to be the main culprits here. How can I optimise this so it runs faster?

(I am pretty new to coding so a simple explanation would be very appreciated! Thanks)

for i in range(N): 

    cossum = 0
    sinsum = 0

    sq = k_array[i][0]**2 + k_array[i][1]**2 + k_array[i][2]**2
   
    if sq <= sqmax and sq>0:

        kvec_used = kvec_used + 1
        kfac[i] = np.exp(-sq/(4*beta**2))/(sq)
    
        for j in range(sites):
            cossum = cossum + q_array_initial[j]*np.cos(np.dot(k_array[i],r_cart_initial[j]))
            sinsum = sinsum + q_array_initial[j]*np.sin(np.dot(k_array[i],r_cart_initial[j]))

            cos_part = (abs(cossum))**2
            sin_part = (abs(sinsum))**2

Answer 1

You can remove all of the for loops in your code using vectorization in Numpy. I recommend reading Look Ma No For Loops as you start venturing down this path.

I took your example and generated some junk data to stand in for your arrays and variables using random .

import numpy as np
import random
import time

kvec_used = 0
k_array = []
sqmax = 10
beta = .5
kfac = []
sites = 3
q_array_initial = r_cart_initial = [.1, .2, .3]

for i in range(12000):
    temp = [random.random(), random.random(), random.random()]
    k_array.append(temp)
    kfac.append(random.random())

mytime = time.time()
for i in range(12000): 
    cossum = 0
    sinsum = 0

    sq = k_array[i][0]**2 + k_array[i][1]**2 + k_array[i][2]**2

    if sq <= sqmax and sq>0:

        kvec_used = kvec_used + 1
        kfac[i] = np.exp(-sq/(4*beta**2))/(sq)
    
        for j in range(sites):
            cossum = cossum + q_array_initial[j]*np.cos(np.dot(k_array[i],r_cart_initial[j]))
            sinsum = sinsum + q_array_initial[j]*np.sin(np.dot(k_array[i],r_cart_initial[j]))

            cos_part = (abs(cossum))**2
            sin_part = (abs(sinsum))**2

print("looped", time.time() - mytime)


mytime = time.time()

k_array = np.asarray(k_array)
sq = np.square(k_array[:, 0]) + np.square(k_array[:, 1]) + np.square(k_array[:, 2])
sq = sq[np.where(np.logical_and(sq[:] <= sqmax, sq[:] > 0))]
kvec = np.shape(sq)[0]
kfac = np.exp(-sq[:]/(4*beta**2))/(sq[:])

for j in range(sites):
    cossum = cossum + q_array_initial[j]*np.cos(np.dot(k_array[i],r_cart_initial[j]))
    sinsum = sinsum + q_array_initial[j]*np.sin(np.dot(k_array[i],r_cart_initial[j]))

    cos_part = (abs(cossum))**2
    sin_part = (abs(sinsum))**2

print("vectorized:", time.time() - mytime)

By only vectorizing the first for loop, I get a speed up of ~100 times when I measure using time.time() . On my machine, the output of this code is:

looped 0.40552735328674316
vectorized: 0.004940986633300781

You should be able to speed up even more when you vectorize the second loop.

Edit: I just realized you asked for more explanation in your original question. The article I linked above explains things in detail and is what I used when I first started vectorizing large data sets. However, I can offer a super-simple explanation to get you going here:

When you have a large array, it is actually a bit slow in Python to run a for loop. Numpy uses some more advanced code behind-the-scenes which can run operations more quickly than you can sequentially in a for loop. For the purposes of a novice learning this, you can accept that Numpy is "magic" for the time being. The big take away here is that rather than going through and operating on the i th element of your array, you want to tell Numpy to operate on the whole thing, represented as myarray[:] . You can extend this out to multidimensional arrays. A full 2D array is myarray[:, :] . If (thinking in spreadsheet terms) you just want to operate on the 0th column and all rows on a 2d array, this becomes myarray[:, 0] .

Here, I have used fancy Numpy-specific methods where and logical_and to replace your if conditional. This is another vectorization technique which is able to handle conditional statements more quickly than an iterative approach.

I purposely left the second for loop untranslated as an exercise for you. No one learns without suffering! You have done good to use np.cos and np.sin here, but you use them in an iterative way. Use the vector approach to run these math operators on all elements of the result vector.

Answer 2

Use vectorization instead of for loops, you cannot speed up much unless you have to developed your

How to speed up for loop?

Question

2 answers

solution1
2 ACCPTED 2021-02-24 19:41:08

solution2
-1 2021-02-26 06:19:09

How to speed up for loop?

Question

2 answers

solution1 2 ACCPTED 2021-02-24 19:41:08

solution2 -1 2021-02-26 06:19:09

solution1
2 ACCPTED 2021-02-24 19:41:08

solution2
-1 2021-02-26 06:19:09