[英]R - overestimation predict function in sinusoidal linear model
背景
我有一个数据文件,其中包含一段时间内荷兰风暴屏障附近的海平面。 目标是拟合描述海平面这种演变的线性模型,并在一定的时间范围内,提前5分钟预测海平面(预测)。
途径
给定某天(正手选择),我选择了一个要拟合\\训练线性模型的时间范围。 经过一些技术调整(请参阅下面的实际代码),我对模型进行了拟合。 然后,在命令“ predict()”中使用线性模型对象和5分钟的时间范围进行预测,并绘制“预测”和置信区间,并在第一个时间帧内紧随拟合模型之后,全部在一个图中(参见下面的示例图)。
问题
模型的预测总是过高或过低。 就数量级而言,预测相差10 ^ 10(或等值e + 10)。 同时,R ^ 2和R_adj ^ 2“相当高”(分别为0,972和0,9334),模型诊断(杠杆,拟合vs残差,正常qq)看起来“合理”。 因此,我的问题是:如何很好地拟合数据,预测/预测如此糟糕? 我唯一的解释是代码中有错误,但我找不到它。
数据集
更具体地说,数据集是一个数据帧,由3列组成(除索引列外):'日期'(“ yyyy-mm-dd”格式),“时间”(“ hh:mm:ss”格式)和“水”(整数介于-150和350之间,海平面以cm为单位)。 这是一小部分数据,已经引起了上述问题:
> SeaLvlAug30[fitRngAug, ]
date time water
1574161 2010-08-30 04:40:00 253
1574162 2010-08-30 04:40:10 254
1574163 2010-08-30 04:40:20 253
1574164 2010-08-30 04:40:30 250
1574165 2010-08-30 04:40:40 250
1574166 2010-08-30 04:40:50 252
1574167 2010-08-30 04:41:00 250
1574168 2010-08-30 04:41:10 247
1574169 2010-08-30 04:41:20 246
1574170 2010-08-30 04:41:30 245
1574171 2010-08-30 04:41:40 242
1574172 2010-08-30 04:41:50 241
1574173 2010-08-30 04:42:00 242
1574174 2010-08-30 04:42:10 244
1574175 2010-08-30 04:42:20 245
1574176 2010-08-30 04:42:30 247
1574177 2010-08-30 04:42:40 247
1574178 2010-08-30 04:42:50 249
1574179 2010-08-30 04:43:00 250
1574180 2010-08-30 04:43:10 250
最小的可运行R代码
# Construct a time frame of a day with steps of 10 seconds
SeaLvlDayTm <- c(1:8640)*10
# Construct the desired fit Range and prediction Range
ftRng <- c(1:20)
predRng <- c(21:50)
# Construct the desired columns for the data frame
date <- rep("2010-08-30", length(c(ftRng,predRng)))
time <- c("04:40:00", "04:40:10", "04:40:20", "04:40:30", "04:40:40", "04:40:50", "04:41:00",
"04:41:10", "04:41:20", "04:41:30", "04:41:40", "04:41:50", "04:42:00", "04:42:10",
"04:42:20", "04:42:30", "04:42:40", "04:42:50", "04:43:00", "04:43:10", "04:43:20",
"04:43:30", "04:43:40", "04:43:50", "04:44:00", "04:44:10", "04:44:20", "04:44:30",
"04:44:40", "04:44:50", "04:45:00", "04:45:10", "04:45:20", "04:45:30", "04:45:40",
"04:45:50", "04:46:00", "04:46:10", "04:46:20", "04:46:30", "04:46:40", "04:46:50",
"04:47:00", "04:47:10", "04:47:20", "04:47:30", "04:47:40", "04:47:50", "04:48:00",
"04:48:10")
timeSec <- c(1681:1730)*10
water <- c(253, 254, 253, 250, 250, 252, 250, 247, 246, 245, 242, 241, 242, 244, 245, 247,
247, 249, 250, 250, 249, 249, 250, 249, 246, 246, 248, 248, 245, 247, 251, 250,
251, 255, 256, 256, 257, 259, 257, 256, 260, 260, 257, 260, 261, 258, 256, 256,
258, 258)
# Construct the data frame
SeaLvlAugStp2 <- data.frame(date, time, timeSec, water)
# Change the index set of the data frame to correspond that of a year
rownames(SeaLvlAugStp2) <- c(1574161:1574210)
#Use a seperate variable for the time (because of a weird error)
SeaLvlAugFtTm <- SeaLvlAugStp2$timeSec[ftRng]
# Fit the linear model
lmObjAug <- lm(SeaLvlAugStp2$water[ftRng] ~ sin((2*pi)/44700 * SeaLvlAugFtTm)
+ cos((2*pi)/44700 * SeaLvlAugFtTm) + poly(SeaLvlAugFtTm, 3)
+ cos((2*pi)/545 * SeaLvlAugFtTm) + sin((2*pi)/545 * SeaLvlAugFtTm)
+ cos((2*pi)/205 * SeaLvlAugFtTm) + sin((2*pi)/205 * SeaLvlAugFtTm)
+ cos((2*pi)/85 * SeaLvlAugFtTm) + sin((2*pi)/85 * SeaLvlAugFtTm),
data = SeaLvlAug30Stp2[ftRng, ])
# Get information about the linear model fit
summary(lmObjAug)
plot(lmObjAug)
#Compute time range prediction and fit
prdtRngTm <- timeSec[prdtRng]
ftRngTm <- timeSec[ftRng]
#Compute prediction/forecast based on fitted data linear model
prdtAug <- predict(lmObjAug, newdata=data.frame(SeaLvlAugFtTm = prdtRngTm), interval="prediction", level=0.95)
#Calculate lower and upper bound confidence interval prediction
lwrAug <- prdtAug[, 2]
uprAug <- prdtAug[, 3]
#Calculate minimum and maximum y axis plot
yminAug <- min(SeaLvlAug30$water[fitRngAug], SeaLvlAug30$water[prdtRngAug], lwrAug)
ymaxAug <- max(SeaLvlAug30$water[fitRngAug], SeaLvlAug30$water[prdtRngAug], uprAug)
#Make the plot
plot((timeSec/10)[ftRng], SeaLvlAugStp2$water[ftRng], xlim = c(min(timeSec/10), max(prdtRngAug30)), ylim = c(yminAug, ymaxAug), col = 'green', pch = 19, main = "Sea Level high water & prediction August 30 ", xlab = "Time (seconds)", ylab = "Sea Level (cm)")
polygon(c(sort(prdtRngTm/10), rev(sort(prdtRngTm/10))), c(uprAug, rev(lwrAug)), col = "gray", border = "gray")
points(prdtRngTm/10, SeaLvlAug30$water[prdtRngTm/10], col = 'green', pch = 19)
lines(ftRngTm/10, fitted(lmObjAug), col = 'blue', lwd = 2)
lines(prdtRngTm/10, prdtAug[, 1], col = 'blue', lwd = 2)
legend("topleft", legend = c("Observ.", "Predicted", "Conf. Int."), lwd = 2, col=c("green", "blue", "gray"), lty = c(NA, 1, 1), pch = c(19, NA, NA))
样例图
在用我们可以运行的代码发布问题之前,我们将无法为您提供更多帮助,但是与此同时,这是Rob J Hyndman预测软件包中的快速而肮脏的预测:
string_data <- "row date time water
1574161 2010-08-30 04:40:00 253
1574162 2010-08-30 04:40:10 254
1574163 2010-08-30 04:40:20 253
1574164 2010-08-30 04:40:30 250
1574165 2010-08-30 04:40:40 250
1574166 2010-08-30 04:40:50 252
1574167 2010-08-30 04:41:00 250
1574168 2010-08-30 04:41:10 247
1574169 2010-08-30 04:41:20 246
1574170 2010-08-30 04:41:30 245
1574171 2010-08-30 04:41:40 242
1574172 2010-08-30 04:41:50 241
1574173 2010-08-30 04:42:00 242
1574174 2010-08-30 04:42:10 244
1574175 2010-08-30 04:42:20 245
1574176 2010-08-30 04:42:30 247
1574177 2010-08-30 04:42:40 247
1574178 2010-08-30 04:42:50 249
1574179 2010-08-30 04:43:00 250
1574180 2010-08-30 04:43:10 250"
SeaLvlAug30 <- read.table(textConnection(string_data), header=TRUE)
library(forecast)
fit <- auto.arima(SeaLvlAug30$water)
summary(fit)
preds <- forecast(fit, h = 25)
preds
# Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
# 21 249.7563 247.7314 251.7812 246.6595 252.8531
# 22 249.4394 246.1177 252.7611 244.3593 254.5195
# 23 249.1388 244.9831 253.2945 242.7832 255.4944
# 24 248.8930 244.2626 253.5234 241.8114 255.9746
# 25 248.7110 243.8397 253.5822 241.2610 256.1609
# 26 248.5867 243.6073 253.5661 240.9713 256.2021
# 27 248.5085 243.4867 253.5302 240.8284 256.1885
# 28 248.4636 243.4280 253.4991 240.7624 256.1647
# 29 248.4410 243.4020 253.4800 240.7345 256.1475
# 30 248.4322 243.3927 253.4718 240.7249 256.1396
# 31 248.4311 243.3916 253.4707 240.7238 256.1385
# 32 248.4337 243.3941 253.4733 240.7263 256.1411
# 33 248.4376 243.3979 253.4773 240.7300 256.1452
# 34 248.4414 243.4016 253.4812 240.7337 256.1492
# 35 248.4447 243.4048 253.4845 240.7368 256.1525
# 36 248.4471 243.4072 253.4870 240.7392 256.1550
# 37 248.4488 243.4089 253.4887 240.7409 256.1567
# 38 248.4499 243.4100 253.4898 240.7420 256.1578
# 39 248.4505 243.4106 253.4905 240.7426 256.1585
# 40 248.4509 243.4109 253.4908 240.7429 256.1588
# 41 248.4510 243.4111 253.4910 240.7431 256.1589
# 42 248.4510 243.4111 253.4910 240.7431 256.1590
# 43 248.4510 243.4111 253.4910 240.7431 256.1590
# 44 248.4510 243.4110 253.4909 240.7430 256.1589
# 45 248.4509 243.4110 253.4909 240.7430 256.1589
plot(preds)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.