简体   繁体   English

R中的分段线性和非线性回归

[英]Piece-wise linear and non-linear regression in R

I have a question which is perhaps more a statistical query than one related to r directly, however it may be that I am just invoking an r package incorrectly so I will post the question here. 我有一个问题,或许更多的是统计查询,而不是直接与r相关的问题,但是我可能只是错误地调用了r包,所以我将在这里发布问题。 I have the following dataset: 我有以下数据集:

x<-c(1e-08, 1.1e-08, 1.2e-08, 1.3e-08, 1.4e-08, 1.6e-08, 1.7e-08, 
1.9e-08, 2.1e-08, 2.3e-08, 2.6e-08, 2.8e-08, 3.1e-08, 3.5e-08, 
4.2e-08, 4.7e-08, 5.2e-08, 5.8e-08, 6.4e-08, 7.1e-08, 7.9e-08, 
8.8e-08, 9.8e-08, 1.1e-07, 1.23e-07, 1.38e-07, 1.55e-07, 1.76e-07, 
1.98e-07, 2.26e-07, 2.58e-07, 2.95e-07, 3.25e-07, 3.75e-07, 4.25e-07, 
4.75e-07, 5.4e-07, 6.15e-07, 6.75e-07, 7.5e-07, 9e-07, 1.15e-06, 
1.45e-06, 1.8e-06, 2.25e-06, 2.75e-06, 3.25e-06, 3.75e-06, 4.5e-06, 
5.75e-06, 7e-06, 8e-06, 9.25e-06, 1.125e-05, 1.375e-05, 1.625e-05, 
1.875e-05, 2.25e-05, 2.75e-05, 3.1e-05)

y2<-c(-0.169718017273307, 7.28508517630734, 71.6802510299446, 164.637259265704, 
322.02901173786, 522.719633360006, 631.977073772459, 792.321270345847, 
971.810607095548, 1132.27551798986, 1321.01923840546, 1445.33152600664, 
1568.14204073109, 1724.30089942149, 1866.79717333592, 1960.12465709003, 
2028.46548012508, 2103.16027631327, 2184.10965255236, 2297.53360080873, 
2406.98288043262, 2502.95194879366, 2565.31085776325, 2542.7485752473, 
2499.42610084412, 2257.31567571328, 2150.92120390084, 1998.13356362596, 
1990.25434682546, 2101.21333152526, 2211.08405955931, 1335.27559108724, 
381.326449703455, 430.9020598199, 291.370887491989, 219.580548355043, 
238.708972427248, 175.583544448326, 106.057481792519, 59.8876372379487, 
26.965143266819, 10.2965349811467, 5.07812046132922, 3.19125838983254, 
0.788251933518549, 1.67980552001939, 1.97695007279929, 0.770663673279958, 
0.209216903989619, 0.0117903221723813, 0.000974437796492681, 
0.000668823762763647, 0.000545308757270207, 0.000490042305650751, 
0.000468780182460397, 0.000322977916070751, 0.000195423690538495, 
0.000175847622407421, 0.000135771259866332, 9.15607623591363e-05)

which when plot looks like this: 当情节看起来像这样: 分段测试

I have then attempted to use the segmentation package to generate three linear regressions (solid black line) in three regions (10^⁻8--10^⁻7,10^⁻7--10^⁻6 and >10^-6) since I have a theoretical basis for finding different relationships in these different regions. 然后我尝试使用分割包在三个区域产生三个线性回归(实线黑色)(10 ^ - 8--10 ^ - 7,10 ^ - 7--10 ^ - 6和> 10 ^ -6因为我有在这些不同地区寻找不同关系的理论基础。 Clearly however my attempt using the following code was unsuccessful: 显然,我尝试使用以下代码是不成功的:

lin.mod <- lm(y2~x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=c(0.0000001,0.000001))

Thus my first question- are there further parameters of the segmentation I can tweak other than the breakpoints? 因此,我的第一个问题是,我可以调整除断点之外的其他参数吗? So far as I understand I have iterations set to maximum as default here. 据我所知,我将迭代次数设置为默认值。

My second question is: could I perhaps attempt a segmentation using the nls package? 我的第二个问题是:我可以尝试使用nls包进行细分吗? It looks as though the first two regions on the plot (10^⁻8--10^⁻7 and 10^-7--10^-6) are further from linear then the final section so perhaps a polynomial function would be better here? 看起来情节中的前两个区域(10 ^ - 8--10 ^ - 7和10 ^ -7--10 ^ -6)离线性更远,然后是最后一个区域,所以也许多项式函数会更好这里?

As an example of a result I find acceptable I have annoted the original plot by hand: 作为结果的一个例子,我觉得可以接受,我手工注释了原始情节: 带注释的分段示例 .

Edit: The reason for using linear fits is the simplicity they provide, to my untrained eye it would require a fairly complex nonlinear function to regress the dataset as a single unit. 编辑:使用线性拟合的原因是它们提供的简单性,对于未经训练的眼睛,它需要相当复杂的非线性函数来将数据集作为单个单元回归。 One thought that had crossed my mind was to fit a lognormal model to the data as this may work given the skew along a log x-axis. 我想到的一个想法是将对数正态模型拟合到数据中,因为这可能在沿着对数x轴的偏斜时起作用。 I do not have enough competence in R to do this however as my knowledge only extends to fitdistr which so far as I understand would not work here. 我没有足够的能力在R做这个,但是因为我的知识只延伸到fitdistr,据我所知,这在这里不起作用。

Any help or guidance in a relevant direction would be most appreciated. 任何相关方向的帮助或指导都将非常受欢迎。

If you are not satisfied with the segmented package, you can try the earth package with the mars algorithm. 如果您对segmented包不满意,可以尝试使用mars算法的earth包。 But here, I find that the result of the segmented model is very acceptable. 但在这里,我发现分段模型的结果是非常可接受的。 see the R-Squared below. 看下面的R-Squared。

lin.mod <- lm(y2~x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=c(0.0000001,0.000001))
 summary(segmented.mod)

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.163e+02  1.143e+02  -1.893   0.0637 .  
x            4.743e+10  3.799e+09  12.485   <2e-16 ***
U1.x        -5.360e+10  3.824e+09 -14.017       NA    
U2.x         6.175e+09  4.414e+08  13.990       NA    

Residual standard error: 232.9 on 54 degrees of freedom
Multiple R-Squared: 0.9468,  Adjusted R-squared: 0.9419 

Convergence attained in 5 iterations with relative change 3.593324e-14 

You can check the result by plotting the model : 您可以通过绘制模型来检查结果:

plot(segmented.mod)

在此输入图像描述

To get the coefficient of the plots , you can do this: 要获得绘图的系数,您可以这样做:

 intercept(segmented.mod)
$x
              Est.
intercept1 -216.30
intercept2 3061.00
intercept3   46.93

> slope(segmented.mod)
$x
             Est.   St.Err.  t value  CI(95%).l  CI(95%).u
slope1  4.743e+10 3.799e+09  12.4800  3.981e+10  5.504e+10
slope2 -6.177e+09 4.414e+08 -14.0000 -7.062e+09 -5.293e+09
slope3 -2.534e+06 5.396e+06  -0.4695 -1.335e+07  8.285e+06

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM