Translating Matlab to Python - Speeding up a loop

Question

I have been translating some code from Matlab to Python that we use to analyse data in our lab. We have two lists of time stamps and we want to use one to herald the other: for every element in the first list we look for time stamps in the second list that have a precise separation in time. In case there are, we place these in a separate list.

Here is an runnable example of the kind of Matlab code I am using, with random data. It is probably VERY crude, as I am not well versed in Matlab. In the following Ctrigger is the trigger list, and Csignal is the signal list that we want to herald. For every element of Ctrigger we look if there are elements in Csignal that are within a window centred on offset , and with width gate . The selected events will be placed in Hsignal .

% Matlab code

Ctrigger = linspace(0, 3000000, (3000000-1)/3);
length_t = length(Ctrigger);

Bsignal = linspace(0, 3000000, (3000000-1)/10);
length_s = length(Bsignal);
noise = reshape(20*rand(length_s,1)-10,[1,length_s]);
Csignal = Bsignal + noise;

offset = 3;
gate = 1;

Hsignal=zeros(length_s,1);
marker = 1;

tic
for j=1:length_t-1
    m = marker;
    tstart=Ctrigger(j)+offset-gate/2;
    tstop=Ctrigger(j)+offset+gate/2;
    while(m <= length_s-1)
        if(Csignal(m)<tstart)
            marker=m;
            m=m+1;
        end
        if(Csignal(m)>=tstart && Csignal(m)<=tstop)
            Hsignal(m)=Csignal(m);
            m = m+1;
        end
        if(Csignal(m)>tstop)
            break;
        end
    end
end

toc

Hsignal=Hsignal(Hsignal~=0);
Hsignal = unique(Hsignal);

Roughly 90'000 events are selected to be placed in Hsignal , and Matlab takes about 0.05 seconds to run this. I have introduced the marker counter because the two lists Csignal and Ctrigger area already ordered in time. marker is set at the start of one heralding window: when I move to the next trigger I will not look again in all of Csignal , but only from the start of that window. To avoid a double count, I remove the duplicates at the end.

If you want to have an idea of the code, here is a simplified version of the input and output:

Ctrigger = [1, 10, 11, 20, 30, 40, 50, 60]
Csignal = [4, 11, 13, 17, 25, 34, 41, 42, 50, 57, 65]
print(Hsignal)
# [4, 11, 13, 41, 42]

Now, I have copied this code from Matlab, just slightly adjusting it to fit into python. Following some advice I first declare the function that contains the main algorithm, and then call it:

# Python code

def main(list1, list2, list3, delay, window):
    marker = 1
    for j in range(len(list1)):
        m = marker
        t_star = list1[j] + delay - window/2
        t_sto = list1[j] + delay + window/2   
        while m < len(list2):   
            if (list2[m] < t_star):
                marker = m
                m = m + 1
            elif (list2[m] >= t_star and list2[m] <= t_sto):
                list3[m] = list2[m]
                m = m + 1
            elif (list2[m] > t_sto):
                break


Ctrigger = range(0, 3000000, 3)
length_t = len(Ctrigger)

Bsignal = range(0, 3000000, 10)
length_s = len(Bsignal)
noise = 1e-05*np.asarray(random.sample(range(-1000000,1000000), int(length_s)))
Csignal = list(np.sort(np.asarray(Bsignal) + noise))

offset = 3
gate = 1

length_t = len(Ctrigger)
length_s = len(Csignal)
Hsignal = list(np.zeros(len(Ctrigger)))

start = time.time()

main(Ctrigger, Csignal, Hsignal, offset, gate)

end = time.time()
Hsignal = np.sort(np.asarray(list(set(Hsignal))))

print(end-start)

Similarly, about 90'000 elements are placed in Hsignal . The key problem is that python takes about 1.1 seconds to run this! I have even tried with this alternative, that removes some loops (here I still use arrays, as I have to add elements to an entire list):

start = time.time()
result = list()
for event in Ctrigger:
    c = Csignal - event - offset
    d = Csignal[abs(c) <= gate/2]
    result.append(list(d))


flat = [item for sublist in result for item in sublist]
flat = np.sort(np.asarray(list(set(flat))))

end = time.time()
print(end-start)

but it's even worse, almost 10 minutes.

I can't really understand where the problem is. For my application Ctrigger is 100e06 long, and Csignal around 20e06. In matlab the same code takes 1.06 seconds, against more than 10 minutes in python. It also seems that it is not straightforward to remove the loops and speeding the process at the same time.

EDIT I: I have introduced the Matlab code I am using, as well as an executable example. I also made Hsignal a list, while Ctrigger and Csignal are still arrays. Result: 0.05s vs 6.5s

EDIT II: now I only use lists, as suggested by RiccardoBucco. Result: 0.05s vs 1.5s

EDIT III: instead of appending to Hsignal I am declaring it first, then changing individual elements, which I noticed brought a small speed up (even though it seems that keeping Hsignal as an array is faster!). Then I declared a function with the main algorithm. Result: 0.05s vs 1.1s

Answer 1

What is probably slowing down your algorithm is the use of np.append in

Hsignal = np.append(Hsignal, Csignal[m])

You should use a list, not a NumPy array:

Ctrigger = [1, 10, 11, 20, 30, 40, 50, 60]
Csignal = [4, 11, 13, 17, 25, 34, 41, 42, 50, 57, 65]

offset = 2
gate = 2

Hsignal = []
marker = 0

for j in range(len(Ctrigger)):
    m = marker
    t_start = Ctrigger[j] + offset - gate/2
    t_stop = Ctrigger[j] + offset + gate/2   
    while m < len(Csignal):   
        if Csignal[m] < t_start:
            marker = m
            m = m + 1
        elif Csignal[m] <= t_stop:
            Hsignal.append(Csignal[m])
            m = m + 1
        else:
            break

Hsignal = sorted(set(Hsignal))

Once the list has been built, you can transform it into an array:

Hsignal = np.array(Hsignal)

Answer 2

How to get the runtime down to 6ms

As you already have seen Python loops are extremely slow. Per default there is no jit-Compiler which speeds up loops as in Matlab. So you have following possibilities:

Vectorize your code in Numpy, if possible.
Use Cython to compile the function
Use Numba to compile the function

In the following example I use Numba, because it is really simple to use in such cases.

Example

import numpy as np
import numba as nb

@nb.njit()
def main_nb(Ctrigger, Csignal, offset, gate):
    Hsignal = np.zeros(Ctrigger.shape[0])

    marker = 1
    for j in range(Ctrigger.shape[0]):
        m = marker
        t_star = Ctrigger[j] + offset - gate/2
        t_sto = Ctrigger[j] + offset + gate/2   
        while m < Csignal.shape[0]:   
            if (Csignal[m] < t_star):
                marker = m
                m = m + 1
            elif (Csignal[m] >= t_star and Csignal[m] <= t_sto):
                Hsignal[m] = Csignal[m]
                m = m + 1
            elif (Csignal[m] > t_sto):
                break
    return Hsignal

Also note to avoid Lists if possible. Use simple arrays like you would do in Matlab.

Timings

import time

#Use simple numpy arrays if possible, not lists
Ctrigger = np.arange(0, 3000000, 3)
length_t = Ctrigger.shape[0]

Bsignal = np.arange(0, 3000000, 10)
noise = 1e-05*np.random.rand(Bsignal.shape[0])
Csignal = np.sort(np.asarray(Bsignal) + noise)

offset = 3
gate = 1

start = time.time()
Hsignal=main(Ctrigger, Csignal, offset, gate)
print("Pure Python takes:" +str(time.time()-start))
#Pure Python takes:6.049151659011841

#First call takes longer (compilation overhead)
#The same may be the case in matlab
start = time.time()
Hsignal=main_nb(Ctrigger, Csignal, offset, gate)
print("First Numba run takes:" +str(time.time()-start))
#First Numba run takes:0.16272664070129395

start = time.time()
Hsignal=main_nb(Ctrigger, Csignal, offset, gate)
print("All further Numba calls run takes:" +str(time.time()-start))
#All further Numba calls run takes:0.006016731262207031

Hsignal = np.unique(Hsignal)

Translating Matlab to Python - Speeding up a loop

Question

2 answers

solution1
3 2019-12-28 16:31:58

solution2
3 ACCPTED 2020-01-14 19:06:44

How to get the runtime down to 6ms

Translating Matlab to Python - Speeding up a loop

Question

2 answers

solution1 3 2019-12-28 16:31:58

solution2 3 ACCPTED 2020-01-14 19:06:44

How to get the runtime down to 6ms

solution1
3 2019-12-28 16:31:58

solution2
3 ACCPTED 2020-01-14 19:06:44