简体   繁体   English

是否可以与 Python 中的一个固定系列进行运行关联?

[英]Is it possible to do running correlation with one fixed series in Python?

I'm wondering if there is a fast way to do running correlation in Python with one fixed series?我想知道是否有一种快速的方法可以在 Python 中使用一个固定系列进行运行关联? I've tried to use Pandas and for example: df1.rolling(4).corr(df2).我尝试使用 Pandas,例如:df1.rolling(4).corr(df2)。 However, it requires two DataFrames to have the same length.但是,它要求两个 DataFrame 具有相同的长度。 Is there a way to do similiar to the above Pandas example, but with one DataFrame being fixed?有没有一种类似于上述 Pandas 示例的方法,但修复了一个 DataFrame?

To clarify, I would want to calculate the correlation coefficent between df2 below and the values in df1.为了澄清,我想计算下面的 df2 和 df1 中的值之间的相关系数。

Example: First correlation between df2 and df1.loc[0:3] Second correlation between df2 and df1.loc[1:4]示例:df2 和 df1.loc[0:3] 之间的第一个相关性 df2 和 df1.loc[1:4] 之间的第二个相关性

etc.等等

I've managed to do this by creating a loop.我设法通过创建一个循环来做到这一点。 However, I find it inefficent when working with larger DataFrames.但是,我发现在使用较大的 DataFrame 时效率低下。

df1 = pd.DataFrame([1,3,2,4,5,6,3,4])
df2 = pd.DataFrame([1,2,3,2])

You can use the pandas.DataFrame.rolling which returns pandas.core.window.Rolling which has apply method.您可以使用pandas.DataFrame.rolling ,它返回pandas.core.window.Rolling方法apply.F4 Then you could pass to apply() any function that calculates the correction you want.然后,您可以将计算所需校正的任何 function 传递给apply()

Example例子

import pandas as pd
from scipy.stats import pearsonr 
import numpy as np 


df1 = pd.DataFrame([1,3,2,4,5,6,3,4,1,2,3,2,2,3,2,5,1,2,1,2,8,8,8,8,8,8,8])
df2 = pd.DataFrame([1,2,3,2])

CORR_VALS = df2[0].values
def get_correlation(vals):
    return pearsonr(vals, CORR_VALS)[0]

df1['correlation'] = df1.rolling(window=len(CORR_VALS)).apply(get_correlation)

  • Note that the window argument in the df1.rolling() should have the same length as the array you are calculating correlation against.请注意, df1.rolling()中的window参数应与您计算相关性的数组具有相同的长度。

this outputs这输出

In [5]: df1['correlation'].values
Out[5]:
array([        nan,         nan,         nan,  0.31622777,  0.31622777,
        0.71713717,  0.63245553, -0.63245553, -0.39223227, -0.63245553,
       -0.63245553,  1.        ,  0.        , -0.70710678,  0.81649658,
        0.        ,  0.47809144, -0.23570226, -0.64699664,  0.        ,
        0.        ,  0.7570333 ,  0.76509206,  0.11043153, -0.77302068,
       -0.11043153,  0.86164044])

which would look like this:看起来像这样:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM