简体   繁体   English

计算pandas数据帧的滚动相关性

[英]Calculating rolling correlation of pandas dataframes

Is it possible to use the rolling window and correlation function in pandas to do a correlation of a shorter dataframe or series to a longer one, and get the result along the longer time series?是否可以在 Pandas 中使用滚动窗口和相关函数将较短的数据帧或序列与较长的数据帧或序列相关联,并沿着较长的时间序列获得结果? Basically doing what the numpy.correlate method does, but instead of cross-correlation, doing pairwise correlations.基本上做 numpy.correlate 方法所做的事情,但不是互相关,而是做成对相关。

x= [0,1,2,3,4,5,4,7,6,9,10,5,6,4,8,7]
y= [4,5,4,5]
print(x)
print(y)
corrs = []
for i in range(0,len(x)-3):
    corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )

with a result of:结果是:

[0.4472135954999579, 0.4472135954999579, 0.4472135954999579, 0.0, 0.8164965809277259, -0.4472135954999579, 0.8320502943378437, 0.0, -0.24253562503633297, 0.24253562503633297, -0.7683498199278325, 0.8451542547285166, -0.50709255283711]

Every combination of windows and pairwise either gives a series of NAN or a "ValueError: Length mismatch".窗口和成对的每个组合都会给出一系列 NAN 或“ValueError: Length mismatch”。 In the simple test case I made, its always NAN or a single result, but no window.在我制作的简单测试用例中,它总是 NAN 或单个结果,但没有窗口。

x = pd.DataFrame(x)
y = pd.DataFrame(y)

corr = y.rolling(np.shape(y)[0]).corr(x)
print(corr)
corr = y.rolling(np.shape(x)[0]).corr(x)
print(corr)
corr = x.rolling(np.shape(x)[0]).corr(y)
print(corr)
corr = x.rolling(np.shape(y)[0]).corr(y)
print(corr)
corr = y.rolling(np.shape(y)[0]).corr(x,pairwise=True)
print(corr)
corr = y.rolling(np.shape(x)[0]).corr(x,pairwise=True)
print(corr)
corr = x.rolling(np.shape(x)[0]).corr(y,pairwise=True)
print(corr)
corr = x.rolling(np.shape(y)[0]).corr(y,pairwise=True)
print(corr)

Use Rolling.apply with np.corrcoef or with Series.corr with same index values like y - so necessary Series.reset_index with drop=True :使用Rolling.applynp.corrcoefSeries.corr具有相同的索引值像y -所以有必要Series.reset_indexdrop=True

x= [0,1,2,3,4,5,4,7,6,9,10,5,6,4,8,7]
y= [4,5,4,5]

corrs = []
for i in range(0,len(x)-3):
    corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )

x = pd.Series(x)
y = pd.Series(y)

corr1 = x.rolling(np.shape(y)[0]).apply(lambda x: np.corrcoef(x, y)[0,1], raw=True)
corr2 = x.rolling(np.shape(y)[0]).apply(lambda x: x.reset_index(drop=True).corr(y), raw=False)

print (pd.concat([pd.Series(corrs).rename(lambda x: x + 3), corr1, corr2], axis=1))
           0         1         2
0        NaN       NaN       NaN
1        NaN       NaN       NaN
2        NaN       NaN       NaN
3   0.447214  0.447214  0.447214
4   0.447214  0.447214  0.447214
5   0.447214  0.447214  0.447214
6   0.000000  0.000000  0.000000
7   0.816497  0.816497  0.816497
8  -0.447214 -0.447214 -0.447214
9   0.832050  0.832050  0.832050
10  0.000000  0.000000  0.000000
11 -0.242536 -0.242536 -0.242536
12  0.242536  0.242536  0.242536
13 -0.768350 -0.768350 -0.768350
14  0.845154  0.845154  0.845154
15 -0.507093 -0.507093 -0.507093

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM