简体   繁体   中英

Fastest way to iterate 2 arrays and perform operations

I have a function which takes as input 2 arrays of zeros and ones ~8000 elements per array. My function eps calculates a statistic on these arrays and returns the output. It is easy operations just checking for 0 and noting the index where 0 is found in array. I tried my best to optimize for speed but the best I could get is 4.5 ~5 seconds (for 18k array pairs) using timeit library. Time is important as I need to run this function on billions of array pairs.

    #e.g. inputs
    #ts_1 = [0,1,1,0,0,1,1,0,......]
    #ts_2 = [1,1,1,1,1,1,1,0,......]
    # tau = any integer or float
    
def eps(ts_1, ts_2, tau):  
    
    n1 = 0
    n2 = 0
    Q_tau = 0
    q_tau = 0

    event_index1 = [index for index, item in enumerate(ts_1) if item == 0]
    n1 = ts_1.count(0)
    event_index2 = [index for index, item in enumerate(ts_2) if item == 0]
    n2 = ts_2.count(0)


    # tried numpy based on @Ram comment below, no improvement
    event_index1, = np.where(np.array(ts_1) == 0)
    n1 = event_index1.shape[0]
    
    event_index2, = np.where(np.array(ts_2) == 0)
    n2 = event_index2.shape[0]
    # tried numpy based on @Ram comment below, no improvement   

    if (n1 == 0 or n2 == 0):
        Q_tau = 0
    else:
        c_ij = 0  
        matching_idx = set(event_index1).intersection(event_index2)
        c_ij = c_ij + (0.5 *len(matching_idx) )
        
        for x,y in product(event_index1,event_index2):
            if x-y > 0 and (x-y)<= tau:
                c_ij = c_ij +1  
        
        c_ji = 0            
        matching_idx_2 = set(event_index2).intersection(event_index1)         
        c_ji = c_ji + (0.5 *len(matching_idx_2) )
        
        for x,y in product(event_index2,event_index1):
            if x-y > 0 and (x-y)<= tau:
                c_ji = c_ji +1                       
                  
        Q_tau = (c_ij+c_ji)/math.sqrt( n1 * n2 )
        q_tau = (c_ij - c_ji)/math.sqrt( n1 * n2 )
    
    return Q_tau, q_tau

Based on my comments earlier, and considering that permuting two lists in a product will give you the same tuples just inverted, you could reduce your code to:

def eps(ts_1, ts_2, tau):  
    Q_tau = 0
    q_tau = 0

    event_index1 = [index for index, item in enumerate(ts_1) if item == 0]
    n1 = len(event_index1)
    event_index2 = [index for index, item in enumerate(ts_2) if item == 0]
    n2 = len(event_index2)
    
    if (n1 != 0 and n2 != 0):
        matching_idx = set(event_index1).intersection(event_index2)
        c_ij = c_ji = 0.5 *len(matching_idx)
        
        for x,y in product(event_index1,event_index2):
            if x-y > 0 and (x-y)<= tau:
                c_ij += 1
            elif y-x > 0 and (y-x) <= tau:
                c_ji += 1                     
                  
        Q_tau = (c_ij+c_ji)/math.sqrt( n1 * n2 )
        q_tau = (c_ij - c_ji)/math.sqrt( n1 * n2 )
    
    return Q_tau, q_tau

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM