Efficient calculation of rolling pearson correlation

Question

As shown in this question Calculating rolling correlation of pandas dataframes , I need to get a correlation of an array of length N to each window in a second array length M.

x= np.random.randint(0,100,10000)
y= [4,5,4,5]
corrs = []
for i in range(0,(len(x)-len(y) ) +1):
    corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )

Every question I find that is similar to this discusses how to do it on a matrix for NxK to MxK. However the ones I try are not working for 1d data. In the linked question, the suggest is to roll over the pandas frame, which is pretty slow. Is there a faster way to calculate this?

The above code takes around 0.4s and the code from the example link takes 1.6s:

corr = x.rolling(4).apply(lambda x: np.corrcoef(x,y)[0,1],raw=False ).dropna(how='all',axis=0)

Is there a much more efficient way to do this?

Answer 1

Store your correlation coefficients in a numpy array instead of a regular python list (you are resizing the list every time you insert an element)

corrs = np.zeros([len(x)-len(y)+1])
for i in range(0,(len(x)-len(y) ) +1):
    corrs[i] = np.corrcoef(x[i:i+4],y)[0,1]

Efficient calculation of rolling pearson correlation

Question

1 answers

solution1
-1 2019-08-07 17:47:03

Efficient calculation of rolling pearson correlation

Question

1 answers

solution1 -1 2019-08-07 17:47:03

solution1
-1 2019-08-07 17:47:03