简体   繁体   中英

Is my R formula equivalent to the statistical model I have in mind?

Problem:

Building statistical models using formula is a powerful and elegant feature of the R language. One of the reasons I haven't used formula as much as I should is that the syntax is a bit confusing (for example x*y does not simply mean "the product of x and y ").

Question:

I am looking for a method to make sure that I have used the formula syntax correctly and that the formula I entered really implements the statistical model I have in mind. Ideally, I would like to have this confirmation before actually fitting the model.

Example:

Say, I want to find the parameters a and b of the model y = a + b*(x1*x2) by linear regression. Naively, I enter this in R

df <- data.frame(y=seq(5), x1=runif(5), x2=runif(5)) # toy data
lm(y~x1*x2, data=df)    # this is wrong

I can tell from the output of lm that this is not what I wanted because of the extra coefficients for x1 and x2 . But it should be possible to debug the formula before calling the fitting function. (The correct way to fit this model would be lm(y~x1:x2, data=df) )

One way you can debug a formula before you run the model is by using formula and update :

f <- formula( y ~ x1*x2)
update( f , terms( f ) )
# y ~ x1 + x2 + x1:x2

f <- formula( y ~ x1:x2)
update( f , terms( f ) )
# y ~ x1:x2

Coincidentally you can also specify the intercept term in your model (ie the coefficient for a ) by including a 1 (1* a = a ) so this is equivalent:

f <- formula( y ~ 1 + x1:x2)
update( f , terms( f ) )
# y ~ x1:x2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM