R-Data.table-在RHS操作中使用变量列名称

Question

How do I use variable column names on the RHS of := operations? 如何在：=操作的RHS上使用变量列名？ For example, given this data.table "dt", I'd like to create two new columns, "first_y" and "first_z" that contains the first observation of the given column for the values of "x". 例如，给定此data.table“ dt”，我想创建两个新列，“ first_y”和“ first_z”，其中包含给定列对“ x”值的首次观察。

dt <- data.table(x = c("one","one","two","two","three"), 
                 y = c("a", "b", "c", "d", "e"), 
                 z = c(1, 2, 3, 4, 5))

dt
       x y z
1:   one a 1
2:   one b 2
3:   two c 3
4:   two d 4
5: three e 5

Here's how you would do it without variable column names. 这是在没有变量列名的情况下的方法。

dt[, c("first_y", "first_z") := .(first(y), first(z)), by = x]

dt
       x y z first_y first_z
1:   one a 1       a       1
2:   one b 2       a       1
3:   two c 3       c       3
4:   two d 4       c       3
5: three e 5       e       5

But how would I do this if the "y" and "z" column names are dynamically stored in a variable? 但是，如果将“ y”和“ z”列名称动态存储在变量中，该怎么办？

cols <- c("y", "z")

# This doesn't work
dt[, (paste0("first_", cols)) := .(first(cols)), by = x]

# Nor does this
q <- quote(first(as.name(cols[1])))
p <- quote(first(as.name(cols[2])))
dt[, (paste0("first_", cols)) := .(eval(q), eval(p)), by = x]

I've tried numerous other combinations of quote() and eval() and as.name() without success. 我尝试了quote（）和eval（）以及as.name（）的许多其他组合，但均未成功。 The LHS of the operation appears to be working as intended and is documented in many places, but I can't find anything about using a variable column name on the RHS. 该操作的LHS似乎正在按预期方式工作，并且已在许多地方进行了记录，但我在RHS上找不到有关使用可变列名的任何信息。 Thanks in advance. 提前致谢。

Answer 1

I'm not familiar with the first function (although it looks like something Hadley would define). 我对first函数不熟悉（尽管看起来像哈德利所定义的东西）。

dt[, paste0("first_", cols) := lapply(.SD, head, n = 1L), 
   by = x, .SDcols = cols]
#       x y z first_y first_z
#1:   one a 1       a       1
#2:   one b 2       a       1
#3:   two c 3       c       3
#4:   two d 4       c       3
#5: three e 5       e       5

Answer 2

The .SDcols answer is fine for this case, but you can also just use get : .SDcols答案适合这种情况，但是您也可以使用get ：

dt[, paste0("first_", cols) := lapply(cols, function(x) get(x)[1]), by = x]
dt
#       x y z first_y first_z
#1:   one a 1       a       1
#2:   one b 2       a       1
#3:   two c 3       c       3
#4:   two d 4       c       3
#5: three e 5       e       5

Another alternative is the vectorized version - mget : 另一个选择是向量化版本mget ：

dt[, paste0("first_", cols) := setDT(mget(cols))[1], by = x]

R-Data.table-在RHS操作中使用变量列名称

问题描述

2 个解决方案

解决方案1
5 2015-12-23 18:36:18

解决方案2
4 2015-12-23 19:28:00

R-Data.table-在RHS操作中使用变量列名称

问题描述

2 个解决方案

解决方案1 5 2015-12-23 18:36:18

解决方案2 4 2015-12-23 19:28:00

解决方案1
5 2015-12-23 18:36:18

解决方案2
4 2015-12-23 19:28:00