简体   繁体   中英

Can my numba code be faster than numpy

I am new to Numba and am trying to speed up some calculations that have proved too unwieldy for numpy. The example I've given below compares a function containing a subset of my calculations using a vectorized/numpy and numba versions of the function the latter of which was also tested as pure python by commenting out the @autojit decorator.

I find that the numba and numpy versions give similar speed ups relative to the pure python, both of which are about a factor of 10 speed improvement. The numpy version was actually slightly faster than my numba function but because of the 4D nature of this calculation I quickly run out of memory when the arrays in the numpy function are sized much larger than this toy example.

This speed up is nice but I have often seen speed ups of >100x on the web when moving from pure python to numba.

I would like to know if there is a general expected speed increase when moving to numba in nopython mode. I would also like to know if there are any components of my numba-ized function that would be limiting further speed increases.

import numpy as np                                                                    
from timeit import default_timer as timer                                             
from numba import  autojit                                                            
import math                                                                           


def vecRadCalcs(slope,  skyz, solz, skya, sola):                                      

    nloc = len(slope)                                                                 
    ntime =  len(solz)                                                                
    [lenz, lena] = skyz.shape                                                         
    asolz = np.tile(np.reshape(solz,[ntime,1,1,1]),[1,nloc,lenz,lena])                
    asola = np.tile(np.reshape(sola,[ntime,1,1,1]),[1,nloc,lenz,lena])                
    askyz = np.tile(np.reshape(skyz,[1,1,lenz,lena]),[ntime,nloc,1,1])                
    askya = np.tile(np.reshape(skya,[1,1,lenz,lena]),[ntime,nloc,1,1])                
    phi1 = np.cos(asolz)*np.cos(askyz)                                                
    phi2 = np.sin(asolz)*np.sin(askyz)*np.cos(askya- asola)                           
    phi12 = phi1 + phi2                                                               
    phi12[phi12> 1.0] = 1.0                                                           
    phi = np.arccos(phi12)                                                            

    return(phi)                                                                       


@autojit                                                                              
def RadCalcs(slope,  skyz, solz, skya, sola, phi):                                    

    nloc = len(slope)                                                                 
    ntime =  len(solz)                                                                
    pop = 0.0                                                                         
    [lenz, lena] = skyz.shape                                                         
    for iiT in range(ntime):                                                          
        asolz = solz[iiT]                                                             
        asola = sola[iiT]                                                             
        for iL in range(nloc):                                                        
            for iz in range(lenz):                                                    
                for ia in range(lena):                                                
                    askyz = skyz[iz,ia]                                               
                    askya = skya[iz,ia]                                               
                    phi1 = math.cos(asolz)*math.cos(askyz)                            
                    phi2 = math.sin(asolz)*math.sin(askyz)*math.cos(askya- asola)     
                    phi12 = phi1 + phi2                                               
                    if phi12 > 1.0:                                                   
                        phi12 = 1.0                                                   
                    phi[iz,ia] = math.acos(phi12)                                     
                    pop = pop + 1                                                     

    return(pop)                                                                       


zenith_cells = 90                                                                     
azim_cells = 360                                                                      
nloc = 10        # nominallly ~ 700                                                   
ntim = 10        # nominallly ~ 200000                                                

slope = np.random.rand(nloc) * 10.0                                                   
solz = np.random.rand(ntim) *np.pi/2.0                                                
sola = np.random.rand(ntim) * 1.0*np.pi                                               

base = np.ones([zenith_cells,azim_cells])                                             
skya =  np.deg2rad(np.cumsum(base,axis=1))                                            
skyz =  np.deg2rad(np.cumsum(base,axis=0)*90/zenith_cells)                            

phi = np.zeros(skyz.shape)                                                            

start = timer()                                                                       
outcalc = RadCalcs(slope,  skyz, solz, skya, sola, phi)                               
stop = timer()                                                                        
outcalc2 = vecRadCalcs(slope,  skyz, solz, skya, sola)                                
stopvec = timer()                                                                     

print(outcalc)                                                                        
print(stop-start)                                                                     
print(stopvec-stop)                                                                   

On my machine running numba 0.31.0, the Numba version is 2x faster than the vectorized solution. When timing numba functions, you need to run the function more than one time because the first time you're seeing the time of jitting the code + the run time. Subsequent runs will not include the overhead of jitting the functions time since Numba caches the jitted code in memory.

Also, please note that your functions are not calculating the same thing -- you want to be careful that you're comparing the same things using something like np.allclose on the results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM