简体   繁体   English

如何从公式/字符串中删除两个连续的加号 (+)?

[英]how can I remove two consecutive pluses (+) from a formula/string?

For example, I have a formula like this:例如,我有一个这样的公式:

main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)

How can I remove two consecutive pluses (+) between var3 and var4 (and leave only one)?如何删除var3var4之间的两个连续加号 (+)(只留下一个)?

It's possible to edit a formula's component parts without coercing to string.可以在不强制转换为字符串的情况下编辑公式的组成部分。 Formulas contain two parts, an expression (the part you write) and an environment (where you write it, maybe with variables in it referred to in the expression).公式包含两部分,一个表达式(您编写的部分)和一个环境(您编写它的地方,其中可能包含表达式中引用的变量)。 The environment we want to hold on to;我们想要坚持的环境; the expression we want to change.我们想要改变的表达方式。

Expressions (by which here I mean language objects like symbols and calls, not the narrowly-defined expression class) are syntax trees, which behave a bit like lists.表达式(这里我指的是符号和调用之类的语言对象,而不是狭义的expression类)是语法树,其行为有点像列表。 They can be subset:它们可以是子集:

f <- main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)

f[[1]]
#> `~`
f[[2]]
#> main_var
f[[3]]
#> 0 + var1:x + var2:y + var3 + +var4 + (0 + main_var | x_y) + (0 + 
#>     add_var | x_y) + (1 | x_y)
f[[3]][[3]]
#> (1 | x_y)

and therefore iterated upon.并因此迭代。 Because they're tree-like structures, to iterate over the whole tree, we need to recurse.因为它们是树状结构,要遍历整棵树,我们需要递归。 Most of the function is pretty typical for recursion (return atomic leaf nodes; recurse over nodes with children), but the tricky part is the condition to identify the part we want to change.大多数函数对于递归非常典型(返回原子叶节点;递归带有子节点的节点),但棘手的部分是识别我们想要更改的部分的条件。 If you look at the node in question, it contains a unary (with one argument) + call:如果您查看有问题的节点,它包含一个一元(带一个参数) +调用:

f <- main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)
f[[3]][[2]][[2]][[2]][[3]]
#> +var4
f[[3]][[2]][[2]][[2]][[3]][[1]]
#> `+`
f[[3]][[2]][[2]][[2]][[3]][[2]]
#> var4

All other + calls are binary.所有其他+调用都是二进制的。 We can thus check for length-2 nodes where the first node is + .因此,我们可以检查第一个节点是+长度为 2 的节点。 As it turns out, getting a + expression is also a bit tricky;事实证明,获得+表达式也有点棘手; the simplest is experssion(+)[[1]] or quote(+1)[[1]] , but once you have that, equality checking works as usual.最简单的是experssion(+)[[1]]quote(+1)[[1]] ,但是一旦有了它,相等性检查就会照常工作。

Putting the pieces together, and cleaning up by coercing pieces back to expressions and formulas,将碎片拼凑起来,并通过将碎片强制恢复为表达式和公式来进行清理,

remove_unary_plus <- function(expr){
    if (length(expr) == 1) {
        # return atomic elements
        return(expr) 
    } else if (length(expr) == 2 && expr[[1]] == expression(`+`)[[1]]) {
        # for unary plus calls, return the argument without the plus
        return(expr[[2]]) 
    } else {
        # otherwise recurse, simplifying the results back to a language object
        clean_expr <- as.call(lapply(expr, remove_unary_plus))

        # if it's a formula, hold on to the environment
        if (inherits(expr, "formula")) {
            clean_expr <- as.formula(clean_expr, env = environment(expr))
        }

        return(clean_expr)
    }
}

f_clean <- remove_unary_plus(f)
f_clean
#> main_var ~ 0 + var1:x + var2:y + var3 + var4 + (0 + main_var | 
#>     x_y) + (0 + add_var | x_y) + (1 | x_y)

And look, it keeps its environment:看,它保持它的环境:

str(f)
#> Class 'formula'  language main_var ~ 0 + var1:x + var2:y + var3 + +var4 + (0 + main_var | x_y) +      (0 + add_var | x_y) + (1 | x_y)
#>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
str(f_clean)
#> Class 'formula'  language main_var ~ 0 + var1:x + var2:y + var3 + var4 + (0 + main_var | x_y) + (0 +      add_var | x_y) + (1 | x_y)
#>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

Obviously this is a bit of a pain for day-to-day formula manipulation, but, well, it's possible, maybe useful for programmatic usage, and (to me, at least) interesting.显然,这对于日常的公式操作来说有点痛苦,但是,这是可能的,可能对编程使用有用,并且(至少对我来说)很有趣。

Something like就像是

as.formula( gsub( ""\\+s*\\+", "+", deparse(f)))

where f is your formula.其中f是您的公式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM