How to get rid of nested for loops in Python code?

Question

I have 1 year of satellite measurements of the electrons (the instrument was measuring every 4 seconds). This array is called 'electrons'. I also have the corresponding times in format datetime.datetime (called 'time'). I want to average electrons array to get a mean value for every minute instead of every 4 seconds. I wanna put them in a new array 'g'. However, when I write the loops, it becomes extremely slow. Is there any way to make it faster? Here is what I do:

import numpy as np
import spacepy.time as spt
import datetime as dt

year=2001
for month in range (1,13):
        dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
        for day in range(1,dmax[month-1]+1):
            for hour in range(24):
                for minute in range(60):

                D1=spt.Ticktock(dt.datetime(year, month, day, hour, minute, 0,0),'UTC').RDT #lower boundary of a minute

#here, spt is a spacepy.time, and '.RDT' returns GREGORIAN ORDINAL TIME.

                D2=spt.Ticktock(dt.datetime(year, month, day, hour, minute, 59,999999),'UTC').RDT #upper boundary of a minute

                mask=((time>D1)&(time<D2))

                electrons_logic=electrons[mask]
                k=(month-1)*dmax[month-1]*24*60+(day-1)*24*60+hour*60+(minute+1) #number of the minute in a year
                g[k,0]=np.nanmean(electrons_logic)

Is there a way to avoid the nested loops and make it faster?

Maybe there is a way to make it faster using multiprocessing/parallel computing?

Answer 1

Whenever you have a problem regarding iteration, think of itertools .

from itertools import product

dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T
for month in range (1,13):
    for day, hour, minute in product(range(1,dmax[month-1]+1), range(24), range(60)):
        ...

I also advise to define dmax outside the loop as it would otherwise be instantiated on each month iteration.

Answer 2

The alternative (at least for the 3 inner loops) is to loop on the number of minutes, then use division+remainder to compute hour and day:

dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
for month in range (1,13):
    nb_days = dmax[month-1]
    for m in range(60*24*nb_days):
        hour,minute = divmod(m,60)
        day,hour = divmod(hour,nb_days)
        day += 1

this is a tradeoff between using 2 divisions/modulos (using the divmod function allows to do it in one go) at each iteration vs 2 loops. As python loops are expensive, it's worth trying.

Answer 3

If there is any static initialization (for your code)

dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month

It should be outside the for-loop. Because, everytime that loop runs, that array will be initialized for that many computations.

Answer 4

It looks like you're not using month, day and minute for anything else than calculating seconds.

you can make it in just 1 loop with something like this without even having to hard code the days in a month array:

year=2001
DT1=dt.datetime(year, 1, 1, 0, 0, 0, 0),'UTC')
DT2=dt.datetime(year, 1, 1, 0, 0, 59, 999999),'UTC')
DToneSec=datetime.timedelta(seconds=1)
DTy=dt.datetime(year+1, 1, 1, 0, 0, 0, 0),'UTC')-DT1

for k in range (1,DTy.total_seconds()+1):
    D1=spt.Ticktock(DT1).RDT
    DT1+=DToneSec
    D2=spt.Ticktock(DT2).RDT
    DT2+=DToneSec

    g[k,0]=np.nanmean(electrons[(time>D1)&(time<D2)])

How to get rid of nested for loops in Python code?

Question

4 answers

solution1
2 2018-06-20 20:07:34

solution2
1 2018-06-20 19:44:16

solution3
0 2018-06-20 19:43:57

solution4
0 2018-06-20 21:46:11

How to get rid of nested for loops in Python code?

Question

4 answers

solution1 2 2018-06-20 20:07:34

solution2 1 2018-06-20 19:44:16

solution3 0 2018-06-20 19:43:57

solution4 0 2018-06-20 21:46:11

solution1
2 2018-06-20 20:07:34

solution2
1 2018-06-20 19:44:16

solution3
0 2018-06-20 19:43:57

solution4
0 2018-06-20 21:46:11