簡體   English   中英

選擇分段回歸中的斷點數

[英]Selecting the number of breakpoints in segmented regression

我正在嘗試為響應變量 Y 估計 X 中的多個斷點。當我在 R 中運行分段 package 時,如果我在 psi 語句中指定 1 個點和在 x=6.5 處指定兩個估計點,我會在 x=14 處得到 1 個估計斷點如果我在 psi 中指定 2 個點,則 x=11.4。 如何確定 2 個斷點是最優的還是 1 個斷點是最優的? 請參閱下面的代碼和 output:

指定 1 個斷點:

segmented.glm(obj = fit.glm, seg.Z = ~x, psi = 10)

Estimated Break-Point(s):
                            Est. St.Err
psi1.x   14  2.691

Null     deviance: 230311  on 1509  degrees of freedom
Residual deviance: 175795  on 1480  degrees of freedom
AIC: 11531
Convergence attained in 0 iter. (rel. change 1.5525e-08)

> slope(fit.seg)
$x
            Est.  St.Err.   t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900   -1.0393  -0.65643
slope2  0.036962 0.574770  0.064308   -1.0896   1.16350

指定 2 個斷點:

fit.seg<-segmented(fit.glm, seg.Z=~x, psi= c(6, 11))
 
Estimated Break-Point(s):
        Est. St.Err
psi1.x  6.562  1.771
psi2.x 11.398  1.660

Null     deviance: 230311  on 1509  degrees of freedom
Residual deviance: 175594  on 1478  degrees of freedom
AIC: 11533
Convergence attained in 1 iter. (rel. change 0)

> slope(fit.seg)
$x
           Est. St.Err.  t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460  -1.03360  -0.10530
slope2 -1.25180 0.38974 -3.21190  -2.01570  -0.48794
slope3 -0.17365 0.31700 -0.54781  -0.79495   0.44765

我使用了 seg.control 但不知道如何解釋 output。 (based on Muggeo, VMR. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20–25.)

> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=2))
Warning message:
max number of iterations (1) attained 
> slope(o)  # defaults to confidence level of 0.95 (conf.level=0.95)
$x
           Est. St.Err.  t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460  -1.03360  -0.10530
slope2 -1.25180 0.38974 -3.21190  -2.01570  -0.48794
slope3 -0.17365 0.31700 -0.54781  -0.79495   0.44765

> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=1))
Warning messages:
1: max number of iterations (1) attained 
2: max number of iterations (1) attained 
> slope(o)  # defaults to confidence level of 0.95 (conf.level=0.95)
$x
            Est.  St.Err.   t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900   -1.0393  -0.65643
slope2  0.036966 0.574770  0.064314   -1.0896   1.16350

誰能幫我弄清楚如何確定 2 個斷點是更好的估計值還是 1 個斷點?

the function selgmented() (also in the R package segmented) is a wrapper to select the "best" number of breakpoints via hypothesis testing (eg Score test) or the BIC. 目前通過假設檢驗的選擇僅限於選擇 0、1 或 2 個斷點。 親切的問候,維托

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM