简体   繁体   中英

How to get rid of nested for loops in Python code?

I have 1 year of satellite measurements of the electrons (the instrument was measuring every 4 seconds). This array is called 'electrons'. I also have the corresponding times in format datetime.datetime (called 'time'). I want to average electrons array to get a mean value for every minute instead of every 4 seconds. I wanna put them in a new array 'g'. However, when I write the loops, it becomes extremely slow. Is there any way to make it faster? Here is what I do:

import numpy as np
import spacepy.time as spt
import datetime as dt

year=2001
for month in range (1,13):
        dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
        for day in range(1,dmax[month-1]+1):
            for hour in range(24):
                for minute in range(60):

                D1=spt.Ticktock(dt.datetime(year, month, day, hour, minute, 0,0),'UTC').RDT #lower boundary of a minute

#here, spt is a spacepy.time, and '.RDT' returns GREGORIAN ORDINAL TIME.

                D2=spt.Ticktock(dt.datetime(year, month, day, hour, minute, 59,999999),'UTC').RDT #upper boundary of a minute

                mask=((time>D1)&(time<D2))

                electrons_logic=electrons[mask]
                k=(month-1)*dmax[month-1]*24*60+(day-1)*24*60+hour*60+(minute+1) #number of the minute in a year
                g[k,0]=np.nanmean(electrons_logic)

Is there a way to avoid the nested loops and make it faster?

Maybe there is a way to make it faster using multiprocessing/parallel computing?

Whenever you have a problem regarding iteration, think of itertools .

from itertools import product

dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T
for month in range (1,13):
    for day, hour, minute in product(range(1,dmax[month-1]+1), range(24), range(60)):
        ...

I also advise to define dmax outside the loop as it would otherwise be instantiated on each month iteration.

The alternative (at least for the 3 inner loops) is to loop on the number of minutes, then use division+remainder to compute hour and day:

dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
for month in range (1,13):
    nb_days = dmax[month-1]
    for m in range(60*24*nb_days):
        hour,minute = divmod(m,60)
        day,hour = divmod(hour,nb_days)
        day += 1

this is a tradeoff between using 2 divisions/modulos (using the divmod function allows to do it in one go) at each iteration vs 2 loops. As python loops are expensive, it's worth trying.

If there is any static initialization (for your code)

dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month

It should be outside the for-loop. Because, everytime that loop runs, that array will be initialized for that many computations.

It looks like you're not using month, day and minute for anything else than calculating seconds.

you can make it in just 1 loop with something like this without even having to hard code the days in a month array:

year=2001
DT1=dt.datetime(year, 1, 1, 0, 0, 0, 0),'UTC')
DT2=dt.datetime(year, 1, 1, 0, 0, 59, 999999),'UTC')
DToneSec=datetime.timedelta(seconds=1)
DTy=dt.datetime(year+1, 1, 1, 0, 0, 0, 0),'UTC')-DT1

for k in range (1,DTy.total_seconds()+1):
    D1=spt.Ticktock(DT1).RDT
    DT1+=DToneSec
    D2=spt.Ticktock(DT2).RDT
    DT2+=DToneSec

    g[k,0]=np.nanmean(electrons[(time>D1)&(time<D2)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM