简体   繁体   English

Python提高了功能速度

[英]Python improving function speed

I am coding my own script to calculate relation between two signals. 我正在编写自己的脚本来计算两个信号之间的关系。 Therefore I use the mlab.csd and mlab.psd functions to compute the CSD and PSD of the signals. 因此,我使用mlab.csd和mlab.psd函数来计算信号的CSD和PSD。 My array x is in the shape of (120,68,68,815). 我的阵列x的形状为(120,68,68,815)。 My script runs several minutes and this function is the hotspot for this high amount of time. 我的脚本运行几分钟,这个功能是这个时间很长的热点。

Anyone any idea what I should do? 任何人都知道我应该做什么? I am not that familiar with script performance increasing. 我不熟悉脚本性能的提高。 Thanks! 谢谢!

# to read the list of stcs for all the epochs
with open('/home/daniel/Dropbox/F[...]', 'rb') as f:
    label_ts = pickle.load(f)

x = np.asarray(label_ts)
nfft = 512
n_freqs = nfft/2+1
n_epochs = len(x) # in this case there are 120 epochs
channels = 68
sfreq = 1017.25

def compute_mean_psd_csd(x, n_epochs, nfft, sfreq):
    '''Computes mean of PSD and CSD for signals.'''

    Rxy = np.zeros((n_epochs, channels, channels, n_freqs), dtype=complex)
    Rxx = np.zeros((n_epochs, channels, channels, n_freqs))
    Ryy = np.zeros((n_epochs, channels, channels, n_freqs))
    for i in xrange(0, n_epochs):
        print('computing connectivity for epoch %s'%(i+1))
        for j in xrange(0, channels):
            for k in xrange(0, channels):
                Rxy[i,j,k], freqs = mlab.csd(x[j], x[k], NFFT=nfft, Fs=sfreq)
                Rxx[i,j,k], _____ = mlab.psd(x[j], NFFT=nfft, Fs=sfreq)
                Ryy[i,j,k], _____ = mlab.psd(x[k], NFFT=nfft, Fs=sfreq)

    Rxy_mean = np.mean(Rxy, axis=0, dtype=np.float32)
    Rxx_mean = np.mean(Rxx, axis=0, dtype=np.float32)
    Ryy_mean = np.mean(Ryy, axis=0, dtype=np.float32)

    return freqs, Rxy, Rxy_mean, np.real(Rxx_mean), np.real(Ryy_mean)

Something that could help, if the csd and psd methods are computationally intensive. 如果csdpsd方法是计算密集型的,那么可能会有所帮助。 There are chances that you could probably simply cache the results of previous calls and get it instead of calculating multiple times. 有可能您可以简单地缓存先前调用的结果并获取它而不是多次计算。

As it seems, you will have 120 * 68 * 68 = 591872 cycles. 看起来,你将有120 * 68 * 68 = 591872个周期。

In the case of the psd calculation, it should be possible to cache the values without problem has the method only depend on one parameter. 在psd计算的情况下,应该可以缓存值而没有问题,方法只依赖于一个参数。

Store the value inside a dict for the x[j] or x[k] check if the value exists. 如果值存在,将值存储在dict中以进行x[j]x[k]检查。 If the value doesn't exist, compute it and store it. 如果该值不存在,请对其进行计算并存储。 If the value exists, simply skip the value and reusue the value. 如果值存在,则只需跳过该值并重新使用该值。

if x[j] not in cache_psd:
     cache_psd[x[j]], ____ = mlab.psd(x[j], NFFT=nfft, Fs=sfreq)
Rxx[i,j,k] = cache_psd[x[j]]

if x[k] not in cache_psd:
     cache_psd[x[k]], ____ = mlab.psd(x[k], NFFT=nfft, Fs=sfreq)
Ryy[i,j,k] = cache_psd[x[k]]

You can do the same with the csd method. 您可以使用csd方法执行相同的操作。 I don't know enough about it to say more. 我不太了解它可以说更多。 If the order of the parameter doesn't matter, you can store the two parameter in a sorted order to prevent duplicates such as 2, 1 and 1, 2 . 如果参数的顺序无关紧要,您可以按排序顺序存储这两个参数,以防止重复,如2, 11, 2

The use of the cache will make the code faster only if the memory access time is lower than the computation time and storing time. 仅当存储器访问时间低于计算时间和存储时间时,使用高速缓存才能使代码更快。 This fix could be easily added with a module that does memoization . 可以使用执行memoization的模块轻松添加此修复程序。

Here's an article about memoization for further reading: 这是一篇关于进一步阅读的备忘录的文章:

http://www.python-course.eu/python3_memoization.php http://www.python-course.eu/python3_memoization.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM