简体   繁体   English

如何在公式中通过字符串使用引用变量?

[英]How to use reference variables by character string in a formula?

In the minimal example below, I am trying to use the values of a character string vars in a regression formula.在下面的最小示例中,我尝试在回归公式中使用字符串vars的值。 However, I am only able to pass the string of variable names ("v2+v3+v4") to the formula, not the real meaning of this string (eg, "v2" is dat$v2).但是,我只能将变量名称字符串(“v2+v3+v4”)传递给公式,而不是该字符串的真正含义(例如,“v2”是 dat$v2)。

I know there are better ways to run the regression (eg, lm(v1 ~ v2 + v3 + v4, data=dat) ).我知道有更好的方法来运行回归(例如, lm(v1 ~ v2 + v3 + v4, data=dat) )。 My situation is more complex, and I am trying to figure out how to use a character string in a formula.我的情况比较复杂,我想弄清楚如何在公式中使用字符串。 Any thoughts?有什么想法吗?

Updated below code更新了下面的代码

# minimal example 
# create data frame
v1 <- rnorm(10)
v2 <- sample(c(0,1), 10, replace=TRUE)
v3 <- rnorm(10)
v4 <- rnorm(10)
dat <- cbind(v1, v2, v3, v4)
dat <- as.data.frame(dat)

# create objects of column names
c.2 <- colnames(dat)[2]
c.3 <- colnames(dat)[3]
c.4 <- colnames(dat)[4]

# shortcut to get to the type of object my full code produces
vars <- paste(c.2, c.3, c.4, sep="+")

### TRYING TO SOLVE FROM THIS POINT:
print(vars)
# [1] "v2+v3+v4"

# use vars in regression
regression <- paste0("v1", " ~ ", vars)
m1 <- lm(as.formula(regression), data=dat)

Update: @Arun was correct about the missing "" on v1 in the first example.更新:@Arun 关于第一个示例中v1上缺少的 "" 是正确的。 This fixed my example, but I was still having problems with my real code.这修复了我的示例,但我的真实代码仍然存在问题。 In the code chunk below, I adapted my example to better reflect my actual code.在下面的代码块中,我修改了我的示例以更好地反映我的实际代码。 I chose to create a simpler example at first thinking that the problem was the string vars .我选择创建一个更简单的示例,起初认为问题出在字符串vars

Here's an example that does not work :) Uses the same data frame dat created above.这是一个不起作用的示例:) 使用上面创建的相同数据框dat

dv <- colnames(dat)[1]
r2 <- colnames(dat)[2]
# the following loop creates objects r3, r4, r5, and r6
# r5 and r6 are interaction terms
for (v in 3:4) {
  r <- colnames(dat)[v]
  assign(paste("r",v,sep=""),r)
  r <- paste(colnames(dat)[2], colnames(dat)[v], sep="*")
  assign(paste("r",v+2,sep=""),r)
}

# combine r3, r4, r5, and r6 then collapse and remove trailing +
vars2 <- sapply(3:6, function(i) { 
                paste0("r", i, "+")
                })
vars2 <- paste(vars2, collapse = '')
vars2 <- substr(vars2, 1, nchar(vars2)-1)

# concatenate dv, r2 (as a factor), and vars into `eq`
eq <- paste0(dv, " ~ factor(",r2,") +", vars2)

Here is the issue:这是问题:

print(eq)
# [1] "v1 ~ factor(v2) +r3+r4+r5+r6"

Unlike regression in the first example, eq does not bring in the column names (eg, v3 ).与第一个示例中的regression不同, eq不会引入列名(例如, v3 )。 The object names (eg, r3 ) are retained.对象名称(例如, r3 )被保留。 As such, the following lm() command does not work.因此,以下lm()命令不起作用。

m2 <- lm(as.formula(eq), data=dat)

I see a couple issues going on here.我看到这里有几个问题。 First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have v1 through v4 floating around both in the global environment as well as in the data frame.首先,我不认为这会造成任何问题,但是让我们一步一步地制作数据框,这样您就不会在全局环境和数据框中都有v1v4浮动。 Second, let's just make v2 a factor here so that we won't have to deal with making it a factor later.其次,让我们在这里将v2设为一个因子,这样我们就不必在以后处理将其设为一个因子。

dat <- data.frame(v1 = rnorm(10),
                  v2 = factor(sample(c(0,1), 10, replace=TRUE)),
                  v3 = rnorm(10),
                  v4 = rnorm(10) )

Part One Now, for your first part, it looks like this is what you want:第一部分现在,对于您的第一部分,看起来这就是您想要的:

lm(v1 ~ v2 + v3 + v4, data=dat)

Here's a simpler way to do that, though you still have to specify the response variable.这是一种更简单的方法,尽管您仍然必须指定响应变量。

lm(v1 ~ ., data=dat)

Alternatively, you certainly can build up the function with paste and call lm on it.或者,您当然可以使用 paste 构建函数并在其上调用lm

f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)

However, my preference in these situations is to use do.call , which evaluates expressions before passing them to the function;但是,在这些情况下,我更喜欢使用do.call ,它在将表达式传递给函数之前评估它们; this makes the resulting object more suitable for calling functions like update on.这使得生成的对象更适合调用update on 等函数。 Compare the call part of the output.比较输出的call部分。

do.call("lm", list(as.formula(f), data=as.name("dat")))

Part Two About your second part, it looks like this is what you're going for:第二部分关于你的第二个部分,它看起来这是你要的内容:

lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)

First, because v2 is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.首先,因为v2是数据帧中的一个因素,我们不需要那部分,其次,可以通过更好地使用 R 的方法来使用算术运算来创建交互,从而进一步简化,就像这样。

lm(v1 ~ v2*(v3 + v4), data=dat)

I'd then simply create the function using paste ;然后我会简单地使用paste创建函数; the loop with assign , even in the larger case, is probably not a good idea.带有assign的循环,即使在更大的情况下,也可能不是一个好主意。

f <- paste(names(dat)[1], "~", names(dat)[2], "* (", 
           paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"

It can then be called using either lm directly or with do.call .然后可以直接使用lm或使用do.call调用它。

lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))

About your code The problem you had with trying to use r3 etc was that you wanted the contents of the variable r3 , not the value r3 .关于您的代码您在尝试使用r3等时遇到的问题是您想要变量r3的内容,而不是值r3 To get the value, you need get , like this, and then you'd collapse the values together with paste .要获得该值,您需要get ,就像这样,然后您将这些值与paste一起折叠。

vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")

However, a better way would be to avoid assign and just build a vector of the terms you want, like this.但是,更好的方法是避免assign并仅构建您想要的术语的向量,就像这样。

vars <- NULL
for (v in 3:4) {
  vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2], 
                                          colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")

A more R-like solution would be to use lapply :更像 R 的解决方案是使用lapply

vars <- unlist(lapply(colnames(dat)[3:4], 
                      function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))

TL;DR: use paste . TL;DR:使用paste

create_ctree <- function(col){
    myFormula <- paste(col, "~.", collapse="")
    ctree(myFormula, data)
}
create_ctree("class")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM