[英]Statsmodels OLS with rolling window problem
I would like to do a regression with a rolling window, but I got only one parameter back after the regression: 我想用滚动窗口进行回归,但在回归后我只得到一个参数:
rolling_beta = sm.OLS(X2, X1, window_type='rolling', window=30).fit()
rolling_beta.params
The result: 结果:
X1 5.715089
dtype: float64
What could be the problem? 可能是什么问题呢?
Thanks in advance, Roland 罗兰,提前谢谢
I think the problem is that the parameters window_type='rolling'
and window=30
simply do not do anything. 我认为问题是参数window_type='rolling'
和window=30
根本不做任何事情。 First I'll show you why, and at the end I'll provide a setup I've got lying around for linear regressions on rolling windows. 首先我会告诉你为什么,最后我将提供一个设置,我已经躺在滚动窗口的线性回归。
1. The problem with your function: 1.你的功能问题:
Since you haven't provided some sample data, here's a function that returns a dataframe of a desired size with some random numbers: 由于您还没有提供一些示例数据,因此这是一个函数,它返回一个具有一些随机数的所需大小的数据帧:
# Function to build synthetic data
import numpy as np
import pandas as pd
import statsmodels.api as sm
from collections import OrderedDict
def sample(rSeed, periodLength, colNames):
np.random.seed(rSeed)
date = pd.to_datetime("1st of Dec, 1999")
cols = OrderedDict()
for col in colNames:
cols[col] = np.random.normal(loc=0.0, scale=1.0, size=periodLength)
dates = date+pd.to_timedelta(np.arange(periodLength), 'D')
df = pd.DataFrame(cols, index = dates)
return(df)
Output: 输出:
X1 X2
2018-12-01 -1.085631 -1.294085
2018-12-02 0.997345 -1.038788
2018-12-03 0.282978 1.743712
2018-12-04 -1.506295 -0.798063
2018-12-05 -0.578600 0.029683
.
.
.
2019-01-17 0.412912 -1.363472
2019-01-18 0.978736 0.379401
2019-01-19 2.238143 -0.379176
Now, try: 现在,尝试:
rolling_beta = sm.OLS(df['X2'], df['X1'], window_type='rolling', window=30).fit()
rolling_beta.params
Output: 输出:
X1 -0.075784
dtype: float64
And this at least represents the structure of your output too, meaning that you're expecting an estimate for each of your sample windows, but instead you get a single estimate. 这至少也代表了输出的结构,这意味着你期望对每个样本窗口进行估计,而是得到一个单一的估计。 So I looked around for some other examples using the same function online and in the statsmodels docs, but I was unable to find specific examples that actually worked. 所以我在网上和statsmodels文档中查找了一些使用相同功能的其他示例,但我无法找到实际工作的具体示例。 What I did find were a few discussions talking about how this functionality was deprecated a while ago. 我找到的是一些讨论,讨论这个功能在不久前是如何被弃用的。 So then I tested the same function with some bogus input for the parameters: 那么我用参数的一些伪输入测试了相同的函数:
rolling_beta = sm.OLS(df['X2'], df['X1'], window_type='amazing', window=3000000).fit()
rolling_beta.params
Output: 输出:
X1 -0.075784
dtype: float64
And as you can see, the estimates are the same, and no error messages are returned for the bogus input. 正如您所看到的,估计值是相同的,并且不会为伪造输入返回错误消息。 So I suggest that you take a look at the function below. 所以我建议你看看下面的功能。 This is something I've put together to perform rolling regression estimates. 这是我用来进行滚动回归估计的东西。
2. A function for regressions on rolling windows of a pandas dataframe 2.用于对熊猫数据帧的滚动窗口进行回归的函数
df = sample(rSeed = 123, colNames = ['X1', 'X2', 'X3'], periodLength = 50)
def RegressionRoll(df, subset, dependent, independent, const, win, parameters):
"""
RegressionRoll takes a dataframe, makes a subset of the data if you like,
and runs a series of regressions with a specified window length, and
returns a dataframe with BETA or R^2 for each window split of the data.
Parameters:
===========
df: pandas dataframe
subset: integer - has to be smaller than the size of the df
dependent: string that specifies name of denpendent variable
inependent: LIST of strings that specifies name of indenpendent variables
const: boolean - whether or not to include a constant term
win: integer - window length of each model
parameters: string that specifies which model parameters to return:
BETA or R^2
Example:
========
RegressionRoll(df=df, subset = 50, dependent = 'X1', independent = ['X2'],
const = True, parameters = 'beta', win = 30)
"""
# Data subset
if subset != 0:
df = df.tail(subset)
else:
df = df
# Loopinfo
end = df.shape[0]
win = win
rng = np.arange(start = win, stop = end, step = 1)
# Subset and store dataframes
frames = {}
n = 1
for i in rng:
df_temp = df.iloc[:i].tail(win)
newname = 'df' + str(n)
frames.update({newname: df_temp})
n += 1
# Analysis on subsets
df_results = pd.DataFrame()
for frame in frames:
#print(frames[frame])
# Rolling data frames
dfr = frames[frame]
y = dependent
x = independent
if const == True:
x = sm.add_constant(dfr[x])
model = sm.OLS(dfr[y], x).fit()
else:
model = sm.OLS(dfr[y], dfr[x]).fit()
if parameters == 'beta':
theParams = model.params[0:]
coefs = theParams.to_frame()
df_temp = pd.DataFrame(coefs.T)
indx = dfr.tail(1).index[-1]
df_temp['Date'] = indx
df_temp = df_temp.set_index(['Date'])
if parameters == 'R2':
theParams = model.rsquared
df_temp = pd.DataFrame([theParams])
indx = dfr.tail(1).index[-1]
df_temp['Date'] = indx
df_temp = df_temp.set_index(['Date'])
df_temp.columns = [', '.join(independent)]
df_results = pd.concat([df_results, df_temp], axis = 0)
return(df_results)
df_rolling = RegressionRoll(df=df, subset = 50, dependent = 'X1', independent = ['X2'], const = True, parameters = 'beta',
win = 30)
Output: A dataframe with beta estimates for OLS of X2 on X1 for each 30 period window of the data. 输出:对于数据的每30个周期窗口,X1上的X2的OLS为β估计的数据帧。
const X2
Date
2018-12-30 0.044042 0.032680
2018-12-31 0.074839 -0.023294
2019-01-01 -0.063200 0.077215
.
.
.
2019-01-16 -0.075938 -0.215108
2019-01-17 -0.143226 -0.215524
2019-01-18 -0.129202 -0.170304
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.