簡體   English   中英

如何在 statsmodel ols 回歸中包含滯后變量

[英]How to include lagged variables in statsmodel ols regression

有沒有辦法在 statsmodel ols 回歸中指定滯后自變量? 這是下面的示例數據框和 ols 模型規范。 我想在模型中包含一個滯后變量。

df = pd.DataFrame({
                   "y": [2,3,7,8,1],
                   "x": [8,6,2,1,9],
                   "v": [4,3,1,3,8]
                 })

Current model:

model = sm.ols(formula = 'y ~ x + v', data=df).fit()

Desired model:

model_lag = sm.ols(formula = 'y ~ (x-1) + v', data=df).fit()

 

我認為你不能在公式中即時調用它。 也許使用移位方法? 如果這不是你需要的,請澄清

import statsmodels.api as sm
df['xlag'] = df['x'].shift()
df

   y  x  v  xlag
0  2  8  4   NaN
1  3  6  3   8.0
2  7  2  1   6.0
3  8  1  3   2.0
4  1  9  8   1.0

sm.formula.ols(formula = 'y ~ xlag + v', data=df).fit()

這已經有一個公認的答案,但要加上我的 2 美分:

  • 在移動之前驗證索引是一種很好的做法(或者您的滯后可能不是您認為的那樣)
  • 可以在公式的很多地方定義一個可以重用的函數

一些示例代碼:

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.DataFrame({"y": [2, 3, 7, 8, 1], "x": [8, 6, 2, 1, 9], "v": [4, 3, 1, 3, 8]})


df.index = pd.PeriodIndex(
    year=[2000, 2000, 2000, 2000, 2001], quarter=[1, 2, 3, 4, 1], freq="Q", name="period"
)


def lag(x, n, validate=True):
    """Calculates the lag of a pandas Series

    Args:
        x (pd.Series): the data to lag
        n (int): How many periods to go back (lag length)
        validate (bool, optional): Validate the series index (monotonic increasing + no gaps + no duplicates). 
                                If specified, expect the index to be a pandas PeriodIndex
                                Defaults to True.

    Returns:
        pd.Series: pd.Series.shift(n) -- lagged series
    """

    if n == 0:
        return x

    if isinstance(x, pd.Series):
        if validate:
            assert x.index.is_monotonic_increasing, (
                "\u274c" + f"x.index is not monotonic_increasing"
            )
            assert x.index.is_unique, "\u274c" + f"x.index is not unique"
            idx_full = pd.period_range(start=x.index.min(), end=x.index.max(), freq=x.index.freq)
            assert np.all(x.index == idx_full), "\u274c" + f"Gaps found in x.index"
        return x.shift(n)

    return x.shift(n)


# Manually create lag as variable:
df["x_1"] = df["x"].shift(1)
smf.ols(formula="y ~ x_1 + v", data=df).fit().summary()


# Use the defined function in the formula:
smf.ols(formula="y ~ lag(x,1) + v", data=df).fit().summary()

# ... can use in multiple places too:
smf.ols(formula="y ~ lag(x,1) + lag(v, 2)", data=df).fit().summary()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM