简体   繁体   中英

understanding rolling correlation in pandas

I am trying to understand how pandas.rolling_corr actually calculates rolling correlations. So far I have always been doing it with numpy. I prefer to use pandas due to the speed and the ease of use, but I cannot get the rolling correlation as it used to do.

I start with two numy arrays:

c = np.array([1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1])
d = np.array([8,9,8])

now I want to calculate the cross-correlation for which length-3-window of my array c. I define a rolling window function:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

and calculate the correlation between each of my generated windows and the second original dataset. This approach works just fine:

for win in rolling_window(c, len(d)):
    print(np.correlate(win, d))

Outputs:

[50]
[75]
[100]
[125]
[150]
[175]
[200]
[209]
[200]
[175]
[150]
[125]
[100]
[75]
[50]

If I attempt to solve it with pandas:

a = pd.DataFrame([1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1])
b = pd.DataFrame([8,9,8])

no matter if I use DataFrame rolling_corr:

a.rolling(window=3, center=True).corr(b)

or Pandas rolling_corr:

pd.rolling_corr(a, b, window=1, center=True)

I just get a bunch of NaNs:

      0
0   NaN
1   0.0
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN
15  NaN
16  NaN

Can someone give me a hand? I am able to solve the problem with numpy by flattening the numpy array obtained from converting the pandas DataFrame

a.values.ravel()

However, I would like to solve the calculation entirely with pandas. I have searched the documentation but haven't found the answer I am looking for. What am I missing or not undrstanding?

Thank you very much in advance.

D.

The computation you're trying to do can be thought of as operating on the following dataframe:

pd.concat([a, b], axis=1)
    0   0
0   1   8
1   2   9
2   3   8
3   4 NaN
4   5 NaN
5   6 NaN
6   7 NaN
7   8 NaN
8   9 NaN
9   8 NaN
10  7 NaN
11  6 NaN
12  5 NaN
13  4 NaN
14  3 NaN
15  2 NaN
16  1 NaN

If you're using window=3, it correlates the first three values in b with the first 3 values in a , leaving the rest with NaN , and placing the value in the center of the window (center=True).

You can try:

pd.rolling_apply(a, window=3, func=lambda x: np.correlate(x, b[0]))

Output:

      0
0   NaN
1   NaN
2    50
3    75
4   100
5   125
6   150
7   175
8   200
9   209
10  200
11  175
12  150
13  125
14  100
15   75
16   50

You can add center=True here too if you'd like.

(I'm using pandas 0.17.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM