简体   繁体   中英

Python improving function speed

I am coding my own script to calculate relation between two signals. Therefore I use the mlab.csd and mlab.psd functions to compute the CSD and PSD of the signals. My array x is in the shape of (120,68,68,815). My script runs several minutes and this function is the hotspot for this high amount of time.

Anyone any idea what I should do? I am not that familiar with script performance increasing. Thanks!

# to read the list of stcs for all the epochs
with open('/home/daniel/Dropbox/F[...]', 'rb') as f:
    label_ts = pickle.load(f)

x = np.asarray(label_ts)
nfft = 512
n_freqs = nfft/2+1
n_epochs = len(x) # in this case there are 120 epochs
channels = 68
sfreq = 1017.25

def compute_mean_psd_csd(x, n_epochs, nfft, sfreq):
    '''Computes mean of PSD and CSD for signals.'''

    Rxy = np.zeros((n_epochs, channels, channels, n_freqs), dtype=complex)
    Rxx = np.zeros((n_epochs, channels, channels, n_freqs))
    Ryy = np.zeros((n_epochs, channels, channels, n_freqs))
    for i in xrange(0, n_epochs):
        print('computing connectivity for epoch %s'%(i+1))
        for j in xrange(0, channels):
            for k in xrange(0, channels):
                Rxy[i,j,k], freqs = mlab.csd(x[j], x[k], NFFT=nfft, Fs=sfreq)
                Rxx[i,j,k], _____ = mlab.psd(x[j], NFFT=nfft, Fs=sfreq)
                Ryy[i,j,k], _____ = mlab.psd(x[k], NFFT=nfft, Fs=sfreq)

    Rxy_mean = np.mean(Rxy, axis=0, dtype=np.float32)
    Rxx_mean = np.mean(Rxx, axis=0, dtype=np.float32)
    Ryy_mean = np.mean(Ryy, axis=0, dtype=np.float32)

    return freqs, Rxy, Rxy_mean, np.real(Rxx_mean), np.real(Ryy_mean)

Something that could help, if the csd and psd methods are computationally intensive. There are chances that you could probably simply cache the results of previous calls and get it instead of calculating multiple times.

As it seems, you will have 120 * 68 * 68 = 591872 cycles.

In the case of the psd calculation, it should be possible to cache the values without problem has the method only depend on one parameter.

Store the value inside a dict for the x[j] or x[k] check if the value exists. If the value doesn't exist, compute it and store it. If the value exists, simply skip the value and reusue the value.

if x[j] not in cache_psd:
     cache_psd[x[j]], ____ = mlab.psd(x[j], NFFT=nfft, Fs=sfreq)
Rxx[i,j,k] = cache_psd[x[j]]

if x[k] not in cache_psd:
     cache_psd[x[k]], ____ = mlab.psd(x[k], NFFT=nfft, Fs=sfreq)
Ryy[i,j,k] = cache_psd[x[k]]

You can do the same with the csd method. I don't know enough about it to say more. If the order of the parameter doesn't matter, you can store the two parameter in a sorted order to prevent duplicates such as 2, 1 and 1, 2 .

The use of the cache will make the code faster only if the memory access time is lower than the computation time and storing time. This fix could be easily added with a module that does memoization .

Here's an article about memoization for further reading:

http://www.python-course.eu/python3_memoization.php

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM