简体   繁体   中英

Fastest way to find exchange times in Python

I have two lists of times. Starting from each point in list1, I want to find the closest subsequent (greater) time in list2.

For example:

list1 = [280, 290]

list2 = [282, 295]

exchange(list1, list2) = [2, 5]

I'm having trouble doing this quickly. The only way I can think to do it is by looping through each element in list1 and taking the first hit in list y greater than that list1 element (lists are sorted). My two attempts below, one pandas, one w/o pandas:

# dictionary containing my two lists
transition_trj = {'ALA19': [270.0, 280.0, 320.0, 330.0, 440.0, 450.0, 
470.0], 'ALA88': [275.0, 285.0, 325.0, 333.0, 445.0, 455.0, 478.0]}
# for example, exchange times for ('ALA19','ALA88') = [5.0, 5.0, 5.0, 3.0, 5.0, 5.0, 8.0]
#find all possible combinations
names = list(transition_trj.keys())
import itertools
name_pairs = list(itertools.combinations_with_replacement(names, 2))

# non-pandas loop, takes 1.59 s

def exchange(Xk,Yk): # for example, a = 'phiALA18', b = 'phiARG11'
    Xv = transition_trj[Xk]
    Yv = transition_trj[Yk]
    pair = tuple([Xk,Yk])
    XY_exchange = []  # one for each pair
    for x in range(len(Yv)-1):  # over all transitions in Y
        ypoint = Yv[x]  # y point
        greater_xpoints = []
        for mini in Xv:
            if mini > ypoint:
                greater_xpoints.append(mini)  # first hit=minimum in sorted list
                break
        if len(greater_xpoints) > 0:  
            exchange = greater_xpoints[0] - ypoint 
            XY_exchange.append(exchange)  
    ET = sum(XY_exchange) * (1/observation_t)
    return pair, ET




# pandas loop, does same thing, takes 11.58 s...I am new to pandas...
import pandas as pd
df = pd.DataFrame(data=transition_trj)

def exchange(dihx, dihy):
    pair = tuple([dihx, dihy])
    exchange_times = []
    for x in range(df.__len__()):
        xpoint = df.loc[x, dihx]
        for y in range(df.__len__()):
            ypoint = df.loc[y, dihy]
            if ypoint > xpoint:
                exchange = ypoint - xpoint
                exchange_times.append(exchange)
                break
    ET = sum(exchange_times) * (1 / observation_t)
    return pair, ET


# here's where I call the def, just for context.
exchange_times = {}
for nm in name_pairs:
    pair, ET = exchange(nm[0],nm[1])
    exchange_times[pair] = ET
    if nm[0] != nm[1]:
        pair2, ET2 = exchange(nm[1], nm[0])
        exchange_times[pair2] = ET2 

I propose a solution with np.searchsorted (numpy is the pandas skeleton) wich find insertion points of a list in an other. it's a O(N ln (N)) solution, when yours is O(N²) since you search the mininimum from begining ( for mini in Xv: ) in each loop.

It works on your example, but I don't know what you want if the two lists have not the same length or are not kindly interleaved. Nevertheless a solution is proposed if the lenghts are equal.

df=pd.DataFrame(transition_trj)
pos=np.searchsorted(df['ALA88'],df['ALA19'])
print(df['ALA88'][pos].reset_index(drop=True)-df['ALA19'])

# 0    5.0
# 1    5.0
# 2    5.0
# 3    3.0
# 4    5.0
# 5    5.0
# 6    8.0
# dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM