简体   繁体   中英

How to implement a non linear model regression in R

I am quite new to both R and Statistics and really need your help. I should analyze some data to find an analytical model that describes it. I have 2 response (y1,y2) and (4 predictors). I thought of performing the analysis using R and followed these steps: 1) For each response, I tested a linear model ( lm command) and I found:

Call:
lm(formula = data_mass$m ~ ., data = data_mass)

Residuals:
       Min         1Q     Median         3Q        Max 
-7.805e-06 -1.849e-06 -1.810e-07  2.453e-06  7.327e-06 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.367e-04  1.845e-05  -7.413 1.47e-06 ***
d            1.632e-04  1.134e-05  14.394 1.42e-10 ***
L            2.630e-08  1.276e-07   0.206  0.83927    
D            1.584e-05  5.103e-06   3.104  0.00682 ** 
p            1.101e-06  1.195e-07   9.215 8.46e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.472e-06 on 16 degrees of freedom
Multiple R-squared:  0.9543,    Adjusted R-squared:  0.9429 
F-statistic: 83.51 on 4 and 16 DF,  p-value: 1.645e-10

2) So I analyzed how good the model is by taking a look at plot(model) graphs. Looking at the "residual vs fitted value" plot, the model should not be linear!! Is it correct?

3) I tried to eliminate some factors (like "L") and to introduce some quadratic terms (d^2 ; D^2), but the "residual vs fitted value" plot has the same trend.

What can I do now? Should I use a non-linear model?

Thank you to everyone can help me =)

UPDATE :

Thank you again. I attached graph of plot(model) and DATA. The responses are m, Fz and the predictors d,L,D,p. The model is a linear model of response m.

[Residual vs Fitted][1]
[Normal Q-Q][2]
[Scale Location][3]
[Residual vs Leverage][4]
[DATA][5]

enter code here

在此处输入图片说明

在此处输入图片说明

在此处输入图片说明在此处输入图片说明

Looking the "residual vs fitted value" plot the model should not be linear!! Is it correct?

Yes and no. If absolute value of the residuals have strong correlation with the fitted values, that could mean heteroscedasticity (heterogeneity of variance). Then the residuals would not be equally spread along the fitted values. And heteroscedasticity is one of the thing you could look at on fitted vs residual graph, because it can invalidate statistical tests such as *t*-test or lm . You could also confirm it with scale-location plot (which is quite similar to this but slightly better).

On the other hand nonlinear distribution indicate nonlinearity and would probably want to change the structure of your model. Though you don´t wont neither linear, nor nonlinear relationship between residuals and fitted values: in ideal case scenario values should be more or less randomly and symmetrically scattered around 0 between two parallel lines with 0 slope. You can find more discussion on the issue here: 1 2 3

What can I do now? Should I use a non-linear model?

If your diagnostic plots indicate nonlinearity, you may want to change/restructure/readjust your model (or transform the data) - there is some discussion on the options here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM