简体   繁体   中英

Efficient calculation of rolling pearson correlation

As shown in this question Calculating rolling correlation of pandas dataframes , I need to get a correlation of an array of length N to each window in a second array length M.

x= np.random.randint(0,100,10000)
y= [4,5,4,5]
corrs = []
for i in range(0,(len(x)-len(y) ) +1):
    corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )

Every question I find that is similar to this discusses how to do it on a matrix for NxK to MxK. However the ones I try are not working for 1d data. In the linked question, the suggest is to roll over the pandas frame, which is pretty slow. Is there a faster way to calculate this?

The above code takes around 0.4s and the code from the example link takes 1.6s:

corr = x.rolling(4).apply(lambda x: np.corrcoef(x,y)[0,1],raw=False ).dropna(how='all',axis=0)

Is there a much more efficient way to do this?

Store your correlation coefficients in a numpy array instead of a regular python list (you are resizing the list every time you insert an element)

corrs = np.zeros([len(x)-len(y)+1])
for i in range(0,(len(x)-len(y) ) +1):
    corrs[i] = np.corrcoef(x[i:i+4],y)[0,1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM