简体   繁体   中英

Using lm() of R, a formula object should be passed as character?

I found a strange behavior of R using lm().

Based on cars object, following function is to plot fitted breaking distance with a localized linear regression at speed 30.

func1 <- function(fm, spd){
  w <- dnorm(cars$speed - spd, sd=5)
  fit <- lm(formula = as.formula(fm), weights = w, data=cars)
  plot(fitted(fit))
}

func2 <- function(fm, spd){
  w <- dnorm(cars$speed - spd, sd=5)
  fit <- lm(formula = fm, weights = w, data=cars)
  plot(fitted(fit))
}

func1("dist ~ speed", 30)
func2(dist ~ speed, 30)

func1 works. but func2 fails with following message:

Error in eval(expr, envir, enclos) : object 'w' not found

The only difference between two functions is that func2 receives formula class as argument.

Using lm() of R in this style, a formula object should be passed as character?

I tested this with R-3.2.1, RStudio 0.99.467, Windows7.

Very interesting case! This relates deeply to the environment feature of R. In short, it seems we should not pass a formula objects defined outside into a function. Although there are some ways to tweak around, the behavior may surprise us.

?formula says:

A formula object has an associated environment , and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.

In your func1 , the formula is generated inside the function, hence it is associated with the function environment (function forms an environment). Hence, when objects are not found in data , the lm call looks for them in the function environment. That is how w is found in func1 .

In the second example, the formula is defined outside the function, or more precisely, in the global environment. Hence the formula looks for objects in the global if not found in the data . Since there is no w in the global, it fails. What could be worse is that if you have another w in the global, this w would be confused and used as the weight.

Here is an example that highlights the order of object search. The data only has y . Hence lm call looks for x elsewhere. But there are two x . fm , formula defined in the global finds x = 1:10 , while as.formula(ch) , defined in the function, finds x = 10:1 . environment tells you which environment the formula is associated with.

fun <- function(fm, ch) {
  x <- 10:1
  dat <- data.frame(y = 1:10)

  print(environment(fm))
  print(lm(fm, data = dat))
  cat("<--- refers to x in the global\n") 

  print(environment(as.formula(ch)))
  print(lm(as.formula(ch), data = dat))
  cat("<--- refers to x in the function\n\n")
}

x <- c(1:10)
fun(y ~ x, "y ~ x")

See also: Environments - Advanced R .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM