简体   繁体   中英

How can I replace one term in an R formula with two?

I have something along the lines of

y ~ x + z

And I would like to transform it to

y ~ x_part1 + x_part2 + z

More generally, I would like to have a function that takes a formula and returns that formula with all terms that match "^x$" replaced by "x_part1" and "x_part2". Here's my current solution, but it just feels so kludgey...

my.formula <- fruit ~ apple + banana
var.to.replace <- 'apple'
my.terms <- labels(terms(my.formula))
new.terms <- paste0('(', 
                    paste0(var.to.replace, 
                           c('_part1', '_part2'),
                           collapse = '+'),
                    ')')
new.formula <- reformulate(termlabels = gsub(pattern = var.to.replace,
                                             replacement = new.terms,
                                             x = my.terms),                                 
                           response = my.formula[[2]])

An additional caveat is that the input formula may be specified with interactions.

y ~ b*x + z

should output one of these (equivalent) formulae

y ~ b*(x_part1 + x_part2) + z
y ~ b + (x_part1 + x_part2) + b:(x_part1 + x_part2) + z
y ~ b + x_part1 + x_part2 + b:x_part1 + b:x_part2 + z

MrFlick has advocated the use of

substitute(y ~ b*x + z, list(x=quote(x_part1 + x_part2)))

but when I have stored the formula I want to modify in a variable, as in

my.formula <- fruit ~ x + banana

This approach seems to require a little more massaging:

substitute(my.formula, list(x=quote(apple_part1 + apple_part2)))
# my.formula

The necessary change to that approach was:

do.call(what = 'substitute',
        args = list(apple, list(x=quote(x_part1 + x_part2))))

But I can't figure out how to use this approach when both 'x' and c('x_part', 'x_part2') are stored in variables with names, eg var.to.replace and new.terms above.

You can use the substitute function for this

substitute(y ~ b*x + z, list(x=quote(x_part1 + x_part2)))
# y ~ b * (x_part1 + x_part2) + z

Here we use the named list to tell R to replace the variable x with the expression x_part1 + x_part2

You can write a recursive function to modify the expression tree of the formula:

replace_term <- function(f, old, new){
  n <- length(f)
  if(n > 1) {
    for(i in 1:n) f[[i]] <- Recall(f[[i]], old, new)

    return(f)
  }

  if(f == old) new else f
}

Which you can use to modify eg interactions:

> replace_term(y~x*a+z - x, quote(x), quote(x1 + x2))
y ~ (x1 + x2) * a + z - (x1 + x2)

How about working with the formula as a string? Many base R models like lm() accept a string formulas (and you can always use formula() otherwise). In this case, you can use something like gsub() :

f1 <- "y ~ x + z"
f2 <- "y ~ b*x + z"

gsub("x", "(x_part1 + x_part2)", f1)
#> [1] "y ~ (x_part1 + x_part2) + z"

gsub("x", "(x_part1 + x_part2)", f2)
#> [1] "y ~ b*(x_part1 + x_part2) + z"

For example, with mtcars data set, and say we want to replace mpg (x) with disp + hp (x_part1 + x_part2):

f1 <- "qsec ~ mpg + cyl"
f2 <- "qsec ~ wt*mpg + cyl"

f1 <- gsub("mpg", "(disp + hp)", f1)
f2 <- gsub("mpg", "(disp + hp)", f2)

lm(f1, data = mtcars)
#> 
#> Call:
#> lm(formula = f1, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)         disp           hp          cyl  
#>    22.04376      0.01017     -0.02074     -0.56571

lm(f2, data = mtcars)
#> 
#> Call:
#> lm(formula = f2, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)           wt         disp           hp          cyl  
#>   20.421318     1.554904     0.026837    -0.056141    -0.876182  
#>     wt:disp        wt:hp  
#>   -0.006895     0.011126

If you just want to modify main effects, you can subtract x, and add in the two new variables.

> f <- y ~ x + z
> update(f, .~.-x+x_part1 + x_part2)
y ~ z + x_part1 + x_part2

As requested by rcorty , storing 'x' and c('x_part', 'x_part2') in var.to.replace and new.terms , respectively, and adotping MrFlick 's suggestion to use setNames , we could perhaps do the following:

my.formula <- fruit ~ x + banana
var.to.replace <- "x"
new.terms <-  c('x_part', 'x_part2') 
new.terms1 <- paste(new.terms, collapse="+")
do.call("substitute", list(my.formula, setNames(list(str2lang(new.terms1)), var.to.replace))) 

> fruit ~ x_part + x_part2 + banana

As an aside, I have found Paul Johnson's Rchaeology (Section 2.1) relevant, educational, and entertaining.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM