I am trying to understand how pandas.rolling_corr actually calculates rolling correlations. So far I have always been doing it with numpy. I prefer to use pandas due to the speed and the ease of use, but I cannot get the rolling correlation as it used to do.
I start with two numy arrays:
c = np.array([1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1])
d = np.array([8,9,8])
now I want to calculate the cross-correlation for which length-3-window of my array c. I define a rolling window function:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
and calculate the correlation between each of my generated windows and the second original dataset. This approach works just fine:
for win in rolling_window(c, len(d)):
print(np.correlate(win, d))
Outputs:
[50]
[75]
[100]
[125]
[150]
[175]
[200]
[209]
[200]
[175]
[150]
[125]
[100]
[75]
[50]
If I attempt to solve it with pandas:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1])
b = pd.DataFrame([8,9,8])
no matter if I use DataFrame rolling_corr:
a.rolling(window=3, center=True).corr(b)
or Pandas rolling_corr:
pd.rolling_corr(a, b, window=1, center=True)
I just get a bunch of NaNs:
0
0 NaN
1 0.0
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
Can someone give me a hand? I am able to solve the problem with numpy by flattening the numpy array obtained from converting the pandas DataFrame
a.values.ravel()
However, I would like to solve the calculation entirely with pandas. I have searched the documentation but haven't found the answer I am looking for. What am I missing or not undrstanding?
Thank you very much in advance.
D.
The computation you're trying to do can be thought of as operating on the following dataframe:
pd.concat([a, b], axis=1)
0 0
0 1 8
1 2 9
2 3 8
3 4 NaN
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 NaN
9 8 NaN
10 7 NaN
11 6 NaN
12 5 NaN
13 4 NaN
14 3 NaN
15 2 NaN
16 1 NaN
If you're using window=3, it correlates the first three values in b
with the first 3 values in a
, leaving the rest with NaN
, and placing the value in the center of the window (center=True).
You can try:
pd.rolling_apply(a, window=3, func=lambda x: np.correlate(x, b[0]))
Output:
0
0 NaN
1 NaN
2 50
3 75
4 100
5 125
6 150
7 175
8 200
9 209
10 200
11 175
12 150
13 125
14 100
15 75
16 50
You can add center=True here too if you'd like.
(I'm using pandas 0.17.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.