[英]R : Robust nonlinear least squares fitting of three-phase linear model with confidence & prediction intervals
我想在R中使用nls
擬合單調遞增的三相線性模型。說我有數據
y <- c(4.5,4.3,2.57,4.40,4.52,1.39,4.15,3.55,2.49,4.27,4.42,4.10,2.21,2.90,1.42,1.50,1.45,1.7,4.6,3.8,1.9)
x <- 1500-c(320,419,650,340,400,800,300,570,720,480,425,460,675,600,850,920,975,1022,450,520,780)
我想得到像 即我有x斷點位於x=B1
和B2
,外加95%的置信度和預測間隔,我想使用investr
軟件包中的predFit
函數基於nls
擬合來計算。
這里的模型是
y=(x < B1)*a +
(x >= B1)*(x <= B2)*(a + b*(x - B1)) +
(x > B2)*(a + b*(B2 - B1))
考慮到B1
應該>min(x)
的期望約束,然后設置
B1 = min(x)+exp(logB1minminx)
為了確保我設置B2 > B1
B2 = B1+exp(logB2minB1)
並確保中間剖面線b > 0
的斜率設置為
b = exp(logb)
[我不太清楚如何放入B2<max(x)
的剩余約束條件]
為了初步了解斜率參數b
的合理起始值
f <- function (d) {
m <- lm(y~x, as.data.frame(d))
return(coef(m)[2])
}
require(zoo)
slopes <- rollapply(data.frame(x=x,y=y), 3, f, by.column=F)
優化參數,然后使用optimx
和method="nlminb"
(=端口算法)一起工作:
preds = function (par) {
B1 = min(x)+exp(par[["logB1minminx"]]) # to make sure that B1 > min(x)
B2 = B1+exp(par[["logB2minB1"]]) # to make sure that B2 > B1
b = exp(par[["logb"]]) # to make sure that slope b > 0
a = par[["a"]]
pred = (x < B1)*a +
(x >= B1)*(x <= B2)*(a + b*(x - B1)) +
(x > B2)*(a + b*(B2 - B1))
return(pred)
}
SSR <- function (par, x=x, y=y) { # sums of squares
fitted = preds(par)
SS = sum((y - fitted)^2)
return(SS) }
library(optimx)
fits = optimx(par = c(logB1minminx=log(650-min(x)), logB2minB1=log(1000-650), a=1.5, logb=log(mean(slopes))),
lower = c(logB1minminx=-100, logB2minB1=-100, a=min(y), logb=-100),
upper = c(logB1minminx=log(mean(x)-min(x)), logB2minB1=log(max(x)-min(x)), a=max(y), logb=log(max(slopes))),
fn = SSR,
x = x,
y = y,
method = "nlminb",
hessian=TRUE,
control=list(all.methods=TRUE, maxit=1000, starttests=FALSE))
fits
# logB1minminx logB2minB1 a logb value fevals gevals niter convcode kkt1 kkt2 xtimes
# L-BFGS-B 5.402100 5.859305 1.511979 -4.804957 6.405210e-01 41 41 NA 0 NA NA 0.01
# nlminb 5.402677 5.858434 1.512409 -4.804421 6.404725e-01 65 155 31 1 NA NA 0.00
# spg 5.402677 5.858560 1.512154 -4.804395 6.404726e-01 349 NA 195 0 NA NA 0.11
# Rcgmin NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
# Rvmmin NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
# bobyqa 5.402677 5.859331 1.511529 -4.804637 6.404949e-01 148 NA NA 0 NA NA 0.00
# nmkb NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
# hjkb 5.147494 5.857933 1.500000 -5.218677 9.533185e+00 1 NA 0 9999 NA NA 0.00
xvals=seq(min(x),max(x),length.out=1000)
plot(x, y, col="black",pch=16)
lines(xvals,
preds(coef(fits)["nlminb",], xvals), col="blue")
因此,這給出了如上所述的擬合。 大多數算法似乎都沒有收斂。 而且,Hessian的計算失敗了,這是一個問題,因為我需要能夠計算出整體擬合的系數,置信區間和預測區間的標准誤差。
同樣,當我詢問系數匯總時, nls
和nlsLM
都返回錯誤,這使我無法使用investr
軟件包來計算置信度和預測間隔:
nlsfit = nls(y ~ (x < (min(x)+logB1minminx))*
a +
(x >= (min(x)+logB1minminx))*
(x <= (min(x)+exp(logB1minminx)+exp(logB2minB1)))*(a + exp(logb)*(x - (min(x)+logB1minminx))) +
(x > (min(x)+exp(logB1minminx)+exp(logB2minB1)))*
(a + exp(logb)*((min(x)+exp(logB1minminx)+exp(logB2minB1)) - (min(x)+logB1minminx))),
data = data.frame(x=x, y=y),
algorithm = "port",
start = c(logB1minminx=log(650-min(x)), logB2minB1=log(1000-650), a=1.6, logb=log(mean(slopes))),
control = nls.control(maxiter=1000, warnOnly=TRUE) )
summary(nlsfit)
# Error in chol2inv(object$m$Rmat()) :
# element (4, 4) is zero, so the inverse cannot be computed
library(investr)
predFit(nlsfit, newdata=data.frame(x=xvals), interval="prediction")
# Error in solve.default(crossprod(R1)) :
# Lapack routine dgesv: system is exactly singular: U[4,4] = 0
require(minpack.lm)
nlslmfit = nlsLM(y ~ (x < (min(x)+logB1minminx))*
a +
(x >= (min(x)+logB1minminx))*
(x <= (min(x)+exp(logB1minminx)+exp(logB2minB1)))*(a + exp(logb)*(x - (min(x)+logB1minminx))) +
(x > (min(x)+exp(logB1minminx)+exp(logB2minB1)))*
(a + exp(logb)*((min(x)+exp(logB1minminx)+exp(logB2minB1)) - (min(x)+logB1minminx))),
data = data.frame(x=x, y=y),
start = c(logB1minminx=log(650-min(x)), logB2minB1=log(1000-650), a=1.6, logb=log(mean(slopes))) )
# Error in nlsModel(formula, mf, start, wts) :
# singular gradient matrix at initial parameter estimates
有人知道我怎么能使用nls
或nlsLM
穩健地擬合這種模型,也許通過使用接近上述三相線性模型的平滑連續可微函數,從而允許將一階導數傳遞給優化器? 我嘗試使用4參數邏輯模型,但找不到與三相線性模型足夠接近的良好平滑中心對稱函數。 如果數據中沒有明確的斷點,我希望在min(x)
處估計B1
,而在max(x)
B2
,我希望在min(x)
處估計B1
,如果有不是上限斷點,我希望在max(x)
處估計B2
。 換句話說,理想情況下,擬合也應該適用於點僅遵循線性模型的數據。 有什么想法嗎?
編輯:取得了一些進展-我發現了一個很好的平滑近似值,並且適合nlsLM
。 如果我在沒有最高斷點的情況下在數據上嘗試它,則仍然無法正常工作-我想我必須嘗試適應多個模型-具有2個斷點,低端或上端只有一個斷點,或者沒有斷點,請參見哪一個擁有最好的AIC或BIC ...
require(minpack.lm)
nlslmfit = nlsLM(y ~ a + (1/2)*exp(logb)*(B2-B1) + # we fit exp(logb) to force b > 0
(1/2)*sqrt(abs(exp(logb)*(4*1E-10+exp(logb)*(B1-x)^2))) - # now set s to 1E-10, we could also fit exp(logs)
(1/2)*sqrt(abs(exp(logb)*(4*1E-10+exp(logb)*(B2-x)^2))),
data = data.frame(x=x, y=y),
start = c(B1=min(x)+1E-10, B2=max(x)-1E-10, a=min(y)+1E-10, logb=log(max(slopes))),
# lower = c(B1=min(x), B2=mean(x), a=min(y), logb=log(min(slopes[slopes>0]))),
# upper = c(B1=mean(x), B2=max(x), a=mean(y), logb=log(max(slopes))),
control = nls.control(maxiter=1000, warnOnly=TRUE) )
# as s->0 this smooth model approximates more closely the piecewise linear one
summary(nlslmfit)
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# B1 699.99988 19.23569 36.39 < 2e-16 ***
# B2 1050.00069 15.49283 67.77 < 2e-16 ***
# a 1.50817 0.09636 15.65 1.57e-11 ***
# logb -4.80172 0.06347 -75.65 < 2e-16 ***
require(investr)
xvals=seq(min(x),max(x),length.out=100)
predintervals = data.frame(x=xvals,predFit(nlslmfit, newdata=data.frame(x=xvals), interval="prediction"))
confintervals = data.frame(x=xvals,predFit(nlslmfit, newdata=data.frame(x=xvals), interval="confidence"))
require(ggplot2)
qplot(data=predintervals, x=x, y=fit, ymin=lwr, ymax=upr, geom="ribbon", fill=I("red"), alpha=I(0.2)) +
geom_ribbon(data=confintervals, aes(x=x, ymin=lwr, ymax=upr), fill=I("blue"), alpha=I(0.2)) +
geom_line(data=confintervals, aes(x=x, y=fit), colour=I("blue"), lwd=2) +
geom_point(data=data.frame(x=x,y=y), aes(x=x, y=y, ymin=NULL, ymax=NULL), size=5, col="blue") +
ylab("y")
# on subset of data without lower breakpoint:
nlslmfit = nlsLM(y ~ a + (1/2)*exp(logb)*(B2-B1) + # we fit exp(logb) to force b > 0
(1/2)*sqrt(abs(exp(logb)*(4*1E-10+exp(logb)*(B1-x)^2))) - # now set s to 1E-10, we could also fit exp(logs)
(1/2)*sqrt(abs(exp(logb)*(4*1E-10+exp(logb)*(B2-x)^2))),
data = data.frame(x=x, y=y),
subset = x>760,
start = c(B1=min(x[x>760])+1E-10, B2=max(x)-1E-10, a=min(y)+1E-10, logb=log(max(slopes))),
# lower = c(B1=min(x), B2=mean(x), a=min(y), logb=log(min(slopes[slopes>0]))),
# upper = c(B1=mean(x), B2=max(x), a=mean(y), logb=log(max(slopes))),
control = nls.control(maxiter=1000, warnOnly=TRUE) )
summary(nlslmfit)
require(investr)
xvals=seq(760,max(x),length.out=100)
predintervals = data.frame(x=xvals,predFit(nlslmfit, newdata=data.frame(x=xvals), interval="prediction"))
confintervals = data.frame(x=xvals,predFit(nlslmfit, newdata=data.frame(x=xvals), interval="confidence"))
require(ggplot2)
qplot(data=predintervals, x=x, y=fit, ymin=lwr, ymax=upr, geom="ribbon", fill=I("red"), alpha=I(0.2)) +
geom_ribbon(data=confintervals, aes(x=x, ymin=lwr, ymax=upr), fill=I("blue"), alpha=I(0.2)) +
geom_line(data=confintervals, aes(x=x, y=fit), colour=I("blue"), lwd=2) +
geom_point(data=data.frame(x=x,y=y)[x>760,], aes(x=x, y=y, ymin=NULL, ymax=NULL), size=5, col="blue") +
ylab("y")
# on subset of data without upper breakpoint - here I still get an error:
nlslmfit = nlsLM(y ~ a + (1/2)*exp(logb)*(B2-B1) + # we fit exp(logb) to force b > 0
(1/2)*sqrt(abs(exp(logb)*(4*1E-10+exp(logb)*(B1-x)^2))) - # now set s to 1E-10, we could also fit exp(logs)
(1/2)*sqrt(abs(exp(logb)*(4*1E-10+exp(logb)*(B2-x)^2))),
data = data.frame(x=x, y=y),
subset = x<1040,
start = c(B1=min(x)+1E-10, B2=max(x[x<1040])-1E-10, a=min(y)+1E-10, logb=log(max(slopes))),
# lower = c(B1=min(x), B2=mean(x), a=min(y), logb=log(min(slopes[slopes>0]))),
# upper = c(B1=mean(x), B2=max(x), a=mean(y), logb=log(max(slopes))),
control = nls.control(maxiter=1000, warnOnly=TRUE) )
summary(nlslmfit)
require(investr)
xvals=seq(min(x),1040,length.out=100)
# here prediction & confidence intervals still fail though:
predintervals = data.frame(x=xvals,predFit(nlslmfit, newdata=data.frame(x=xvals), interval="prediction"))
# Error in solve.default(crossprod(R1)) :
# system is computationally singular: reciprocal condition number = 2.65525e-23
confintervals = data.frame(x=xvals,predFit(nlslmfit, newdata=data.frame(x=xvals), interval="confidence"))
require(ggplot2)
qplot(data=predintervals, x=x, y=fit, ymin=lwr, ymax=upr, geom="ribbon", fill=I("red"), alpha=I(0.2)) +
geom_ribbon(data=confintervals, aes(x=x, ymin=lwr, ymax=upr), fill=I("blue"), alpha=I(0.2)) +
geom_line(data=confintervals, aes(x=x, y=fit), colour=I("blue"), lwd=2) +
geom_point(data=data.frame(x=x,y=y)[x<1040,], aes(x=x, y=y, ymin=NULL, ymax=NULL), size=5, col="blue") +
ylab("y")
library(minpack.lm)
fo <- y ~ pmax(a1, pmin(a2 + b * x, a3))
co <- coef(lm(y ~ x))
fm <- nlsLM(fo, start = list(a1 = min(y), a2 = co[[1]], b = co[[2]], a3 = max(y)))
o <- order(x)
plot(y ~ x, subset = o)
lines(fitted(fm) ~ x, subset = o, col = "red")
summary(fm)
library(investr)
predFit(fm, data.frame(x), se = TRUE)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.