简体   繁体   中英

how can I remove two consecutive pluses (+) from a formula/string?

For example, I have a formula like this:

main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)

How can I remove two consecutive pluses (+) between var3 and var4 (and leave only one)?

It's possible to edit a formula's component parts without coercing to string. Formulas contain two parts, an expression (the part you write) and an environment (where you write it, maybe with variables in it referred to in the expression). The environment we want to hold on to; the expression we want to change.

Expressions (by which here I mean language objects like symbols and calls, not the narrowly-defined expression class) are syntax trees, which behave a bit like lists. They can be subset:

f <- main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)

f[[1]]
#> `~`
f[[2]]
#> main_var
f[[3]]
#> 0 + var1:x + var2:y + var3 + +var4 + (0 + main_var | x_y) + (0 + 
#>     add_var | x_y) + (1 | x_y)
f[[3]][[3]]
#> (1 | x_y)

and therefore iterated upon. Because they're tree-like structures, to iterate over the whole tree, we need to recurse. Most of the function is pretty typical for recursion (return atomic leaf nodes; recurse over nodes with children), but the tricky part is the condition to identify the part we want to change. If you look at the node in question, it contains a unary (with one argument) + call:

f <- main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)
f[[3]][[2]][[2]][[2]][[3]]
#> +var4
f[[3]][[2]][[2]][[2]][[3]][[1]]
#> `+`
f[[3]][[2]][[2]][[2]][[3]][[2]]
#> var4

All other + calls are binary. We can thus check for length-2 nodes where the first node is + . As it turns out, getting a + expression is also a bit tricky; the simplest is experssion(+)[[1]] or quote(+1)[[1]] , but once you have that, equality checking works as usual.

Putting the pieces together, and cleaning up by coercing pieces back to expressions and formulas,

remove_unary_plus <- function(expr){
    if (length(expr) == 1) {
        # return atomic elements
        return(expr) 
    } else if (length(expr) == 2 && expr[[1]] == expression(`+`)[[1]]) {
        # for unary plus calls, return the argument without the plus
        return(expr[[2]]) 
    } else {
        # otherwise recurse, simplifying the results back to a language object
        clean_expr <- as.call(lapply(expr, remove_unary_plus))

        # if it's a formula, hold on to the environment
        if (inherits(expr, "formula")) {
            clean_expr <- as.formula(clean_expr, env = environment(expr))
        }

        return(clean_expr)
    }
}

f_clean <- remove_unary_plus(f)
f_clean
#> main_var ~ 0 + var1:x + var2:y + var3 + var4 + (0 + main_var | 
#>     x_y) + (0 + add_var | x_y) + (1 | x_y)

And look, it keeps its environment:

str(f)
#> Class 'formula'  language main_var ~ 0 + var1:x + var2:y + var3 + +var4 + (0 + main_var | x_y) +      (0 + add_var | x_y) + (1 | x_y)
#>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
str(f_clean)
#> Class 'formula'  language main_var ~ 0 + var1:x + var2:y + var3 + var4 + (0 + main_var | x_y) + (0 +      add_var | x_y) + (1 | x_y)
#>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

Obviously this is a bit of a pain for day-to-day formula manipulation, but, well, it's possible, maybe useful for programmatic usage, and (to me, at least) interesting.

Something like

as.formula( gsub( ""\\+s*\\+", "+", deparse(f)))

where f is your formula.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM