[英]time series analysis with covariates
我有一個數據集,其中包含來自數千個人的數據,測量了過去9年每年測量的參數X.
基本上它們在數據幀df中
id,year,x,feature
A,2016,376,female
A,2015,391,female
A,2014,376,female
A,2013,373,female
A,2012,347,female
A,2011,330,female
B,2016,398,male
B,2015,391,male
B,2014,410,male
B,2013,393,male
B,2012,408,male
B,2011,288,male
C,2016,2464,male
C,2015,2465,male
C,2014,2500,male
C,2013,2215,male
C,2012,2228,male
C,2011,1839,male
等等
我想估計這些時間序列的不同模型
像predict(x(t))= f(x(t-1),x(t-2),...,x(tn),feature,id(作為隨機因子))
我可以看到如何使用ts進行自回歸建模,但它會計算單個模型的thosands,我想要根據時間歷史和特征進行全局預測(及其固有的問題)。
lm不是一個好主意,因為數據是高度自相關的。 有什么好主意嗎?
有許多可能的模型,但這里有一個AR1結構的混合效果模型,您可以嘗試。
library(nlme)
fm <- lme(x ~ year + feature, random = ~ year | id, DF,
correlation = corAR1(form = ~ year | id))
summary(fm)
這是一個數據圖:
library(ggplot2)
ggplot(DF, aes(year, x, group = id, col = feature)) + geom_line() + geom_point()
注意:我們假設這個輸入數據:
Lines <- "
id,year,x,feature
A,2016,376,female
A,2015,391,female
A,2014,376,female
A,2013,373,female
A,2012,347,female
A,2011,330,female
B,2016,398,male
B,2015,391,male
B,2014,410,male
B,2013,393,male
B,2012,408,male
B,2011,288,male
C,2016,2464,male
C,2015,2465,male
C,2014,2500,male
C,2013,2215,male
C,2012,2228,male
C,2011,1839,male"
library(zoo)
DF <- read.csv(text = Lines, strip.white = TRUE)
關於函數f()
的陳述產生了許多選擇。
但是,在線性類中,您可以使用向量廣義線性模型(通過vglm())來擬合具有ARMA(或GARCH)結構的廣義線性模型,並結合協變量。
例如,假設(預先假定的)隨機錯誤是正態分布的,您可以使用包VGAMextra
的族函數ARff()
,如下所示。
然而,第二選項通過智能預測使用非參數版本,即VGAM。 唯一的缺點是vglms / vgams不處理隨機效果。
library(VGAM)
library(VGAMextra)
# Fitting a linear model to the mean of the normal distribution
# allowing an AR(3) struture. Use the modelling function vglm() and
# the family functions ARff()
df.read <- DF # DF as given by G.G.
fit.Lines <- vglm(x ~ feature , ARff(order = 3,
zero = c("Var", "ARcoeff")),
data = df.read, trace = TRUE)
coef(fit.Lines, matrix = TRUE)
summary(fit.Lines, HD = FALSE)
with(df.read, plot(fitted.values(fit.Lines) ~ year,
ylim = c(0, 3000),
pch = 19, col = as.factor(feature)))
# Using VGAMs, here, the family function uninormal() is utilized.
#
df.read2 <- data.frame(embed(df.read$x, 4))
names(df.read2) <- c("x", "xLag1", "xLag2", "xLag3")
df.read2 <- transform(df.read2, year = df.read$year[-c(1:3)],
feature = df.read$feature[-c(1:3)])
fit.Lines.vgams <- vgam(x ~ sm.bs(xLag1) + sm.bs(xLag2) +
sm.bs(xLag3) + feature + year,
uninormal, data = df.read2, trace = TRUE)
with(df.read2, plot(fitted.values(fit.Lines.vgams) ~ year,
ylim = c(0, 3000),
pch = 19, col = as.factor(feature)))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.