简体   繁体   English

r data.table lapply 或 for 循环来创建变量或生成列

[英]r data.table lapply or for loop to create variables or generate columns

I want to create several variables using a formula with R data.table.我想使用带有 R data.table 的公式创建几个变量。 I have a list of variables, and for each one I want to perform a calculation and create a new variable, pasting the same string onto each column name.我有一个变量列表,对于每个变量,我想执行计算并创建一个新变量,将相同的字符串粘贴到每个列名上。 I can get it to work for one variable at a time, but it doesn't work for a lapply or a loop.我可以让它一次为一个变量工作,但它不适用于 lapply 或循环。 I suspect I am missing something with R data.table and quotation marks or variable names vs. strings.我怀疑我在 R data.table 和引号或变量名与字符串中遗漏了一些东西。 Do I need to use ".." or wrap with eval()?我需要使用“..”还是用 eval() 换行? A dplyr (or any tidyverse) solution would solve the issue too. dplyr(或任何 tidyverse)解决方案也可以解决这个问题。

Here is example code with mtcars:这是带有 mtcars 的示例代码:

library(data.table)
mtcars.dt <- setDT(mtcars)
myVars <- c("mpg", "hp", "qsec")

# Doesn't work:
for( myVar in myVars){
  mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp]
}

# Doesn't work:
lapply(myVars, function(myVar) mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp])

# Works:
mtcars.dt[, mpg.disp.ratio := mpg / disp]

# Doesn't work
for (myVar in myVars){
  mtcars.dt[, paste0(myVar, ".disp.lm.adj") := 
              myVar - 
              lm(data = .SD, formula = myVar ~ disp)$coefficients[2] * (disp - mean(disp))]
}

# Doesn't work
lapply(myVars, function(x) mtcars.dt[, paste0(x, ".disp.lm.adj") := 
                                       x - 
                                       lm(data = .SD, formula = x ~ disp)$coefficients[2] * (disp - mean(disp))])

# Works
mtcars.dt[, mpg.disp.lm.adj := 
            mpg - 
            lm(data = .SD, formula = mpg ~ disp)$coefficients[2] * (disp - mean(disp))]

For the ratio calculation, I get the following error:对于比率计算,我收到以下错误:

Error in myVar/disp : non-numeric argument to binary operator 

For the lm adjustment, I get the following error:对于 lm 调整,我收到以下错误:

Error in model.frame.default(formula = myVar ~ disp, data = .SD, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'disp')

We can use get我们可以使用get

library(data.table)
for( myVar in myVars){
   mtcars.dt[, paste0(myVar, ".disp.ratio") := get(myVar) / disp]
  }

Or wrap with eval after converting to symbol或转换为symbol后用eval换行

for( myVar in myVars){
   mtcars.dt[, paste0(myVar, ".disp.ratio") := eval(as.name(myVar)) / disp]
  }

Or another option is to specify in .SDcols , loop over the .SD (Subset of Data.table, do the transformation and create the new variables by assignment ( := )或者另一种选择是在.SDcols指定,在.SDcols循环( .SD子集,进行转换并通过赋值( := )创建新变量)

mtcars.dt[, paste0(myVars, ".disp.ratio") := lapply(.SD, `/`, disp), 
             .SDcols = myVars]

For the second case, we can create the formula with paste对于第二种情况,我们可以使用paste创建公式

for (myVar in myVars) {
  mtcars.dt[, paste0(myVar, ".disp.lm.adj") := 
              get(myVar) - 
              lm(data = .SD, formula = paste(myVar,  "~ disp"))$coefficients[2] *
               (disp - mean(disp))]
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM