[英]r data.table lapply or for loop to create variables or generate columns
I want to create several variables using a formula with R data.table.我想使用带有 R data.table 的公式创建几个变量。 I have a list of variables, and for each one I want to perform a calculation and create a new variable, pasting the same string onto each column name.
我有一个变量列表,对于每个变量,我想执行计算并创建一个新变量,将相同的字符串粘贴到每个列名上。 I can get it to work for one variable at a time, but it doesn't work for a lapply or a loop.
我可以让它一次为一个变量工作,但它不适用于 lapply 或循环。 I suspect I am missing something with R data.table and quotation marks or variable names vs. strings.
我怀疑我在 R data.table 和引号或变量名与字符串中遗漏了一些东西。 Do I need to use ".." or wrap with eval()?
我需要使用“..”还是用 eval() 换行? A dplyr (or any tidyverse) solution would solve the issue too.
dplyr(或任何 tidyverse)解决方案也可以解决这个问题。
Here is example code with mtcars:这是带有 mtcars 的示例代码:
library(data.table)
mtcars.dt <- setDT(mtcars)
myVars <- c("mpg", "hp", "qsec")
# Doesn't work:
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp]
}
# Doesn't work:
lapply(myVars, function(myVar) mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp])
# Works:
mtcars.dt[, mpg.disp.ratio := mpg / disp]
# Doesn't work
for (myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.lm.adj") :=
myVar -
lm(data = .SD, formula = myVar ~ disp)$coefficients[2] * (disp - mean(disp))]
}
# Doesn't work
lapply(myVars, function(x) mtcars.dt[, paste0(x, ".disp.lm.adj") :=
x -
lm(data = .SD, formula = x ~ disp)$coefficients[2] * (disp - mean(disp))])
# Works
mtcars.dt[, mpg.disp.lm.adj :=
mpg -
lm(data = .SD, formula = mpg ~ disp)$coefficients[2] * (disp - mean(disp))]
For the ratio calculation, I get the following error:对于比率计算,我收到以下错误:
Error in myVar/disp : non-numeric argument to binary operator
For the lm adjustment, I get the following error:对于 lm 调整,我收到以下错误:
Error in model.frame.default(formula = myVar ~ disp, data = .SD, drop.unused.levels = TRUE) :
variable lengths differ (found for 'disp')
We can use get
我们可以使用
get
library(data.table)
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := get(myVar) / disp]
}
Or wrap with eval
after converting to symbol
或转换为
symbol
后用eval
换行
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := eval(as.name(myVar)) / disp]
}
Or another option is to specify in .SDcols
, loop over the .SD
(Subset of Data.table, do the transformation and create the new variables by assignment ( :=
)或者另一种选择是在
.SDcols
指定,在.SDcols
循环( .SD
子集,进行转换并通过赋值( :=
)创建新变量)
mtcars.dt[, paste0(myVars, ".disp.ratio") := lapply(.SD, `/`, disp),
.SDcols = myVars]
For the second case, we can create the formula with paste
对于第二种情况,我们可以使用
paste
创建公式
for (myVar in myVars) {
mtcars.dt[, paste0(myVar, ".disp.lm.adj") :=
get(myVar) -
lm(data = .SD, formula = paste(myVar, "~ disp"))$coefficients[2] *
(disp - mean(disp))]
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.