I have 1 year of satellite measurements of the electrons (the instrument was measuring every 4 seconds). This array is called 'electrons'. I also have the corresponding times in format datetime.datetime (called 'time'). I want to average electrons array to get a mean value for every minute instead of every 4 seconds. I wanna put them in a new array 'g'. However, when I write the loops, it becomes extremely slow. Is there any way to make it faster? Here is what I do:
import numpy as np
import spacepy.time as spt
import datetime as dt
year=2001
for month in range (1,13):
dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
for day in range(1,dmax[month-1]+1):
for hour in range(24):
for minute in range(60):
D1=spt.Ticktock(dt.datetime(year, month, day, hour, minute, 0,0),'UTC').RDT #lower boundary of a minute
#here, spt is a spacepy.time, and '.RDT' returns GREGORIAN ORDINAL TIME.
D2=spt.Ticktock(dt.datetime(year, month, day, hour, minute, 59,999999),'UTC').RDT #upper boundary of a minute
mask=((time>D1)&(time<D2))
electrons_logic=electrons[mask]
k=(month-1)*dmax[month-1]*24*60+(day-1)*24*60+hour*60+(minute+1) #number of the minute in a year
g[k,0]=np.nanmean(electrons_logic)
Is there a way to avoid the nested loops and make it faster?
Maybe there is a way to make it faster using multiprocessing/parallel computing?
Whenever you have a problem regarding iteration, think of itertools
.
from itertools import product
dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T
for month in range (1,13):
for day, hour, minute in product(range(1,dmax[month-1]+1), range(24), range(60)):
...
I also advise to define dmax
outside the loop as it would otherwise be instantiated on each month
iteration.
The alternative (at least for the 3 inner loops) is to loop on the number of minutes, then use division+remainder to compute hour and day:
dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
for month in range (1,13):
nb_days = dmax[month-1]
for m in range(60*24*nb_days):
hour,minute = divmod(m,60)
day,hour = divmod(hour,nb_days)
day += 1
this is a tradeoff between using 2 divisions/modulos (using the divmod
function allows to do it in one go) at each iteration vs 2 loops. As python loops are expensive, it's worth trying.
If there is any static initialization (for your code)
dmax=np.array([[31,28,31,30,31,30,31,31,30,31,30,31]]).T #number of days in a month
It should be outside the for-loop. Because, everytime that loop runs, that array will be initialized for that many computations.
It looks like you're not using month, day and minute for anything else than calculating seconds.
you can make it in just 1 loop with something like this without even having to hard code the days in a month array:
year=2001
DT1=dt.datetime(year, 1, 1, 0, 0, 0, 0),'UTC')
DT2=dt.datetime(year, 1, 1, 0, 0, 59, 999999),'UTC')
DToneSec=datetime.timedelta(seconds=1)
DTy=dt.datetime(year+1, 1, 1, 0, 0, 0, 0),'UTC')-DT1
for k in range (1,DTy.total_seconds()+1):
D1=spt.Ticktock(DT1).RDT
DT1+=DToneSec
D2=spt.Ticktock(DT2).RDT
DT2+=DToneSec
g[k,0]=np.nanmean(electrons[(time>D1)&(time<D2)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.