简体   繁体   中英

R data.table using lapply on functions defined outside

This question is related to R - pass fixed columns to lapply function in data.table and weighted means by group and column , but somewhat different.

I would like to have one fixed column interacting with all other columns in a data.table . A trivial example to illustrate:

DT <- data.table(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
DT[, lapply(c('x1', 'x2'), function(x) get(x) * y)]

Now suppose that the operation is a lot more complicated than multiplication, such that I would like to define a standalone function outside the scope of the data.table :

fun <- function(x) {
    return(get(x) * y)
}
DT[, lapply(c('x1', 'x2'), fun)]
Error in get(x) : object 'x1' not found

Obvious there is an issue with variable scope, since the function defined outside the data.table cannot see the variables inside. Is there any clever trick to define the function outside data.table and still be able to use lapply ?

You will wrap your self in knots if you are trying to combine references by character string and named variables. (and also by refering to "global" variables within functions)

The easiest way forward would be to define where get is looking for x , (and y )

Here is the function rewritten so that you can tell it where to look.

fun <- function(x,y,wherex=parent.frame(),wherey=parent.frame()) {
    return(get(x,wherex) * get(y,wherey))
}

data.table inspects the names present in j , and only loads in the required columns.

In your example, you don't use the column names, so nothing is available.

If you include .SD in the expression for j , it will load in all columns. You can use .SD as the wherex / wherey arguments to the newly defined fun

DT[, lapply(c('x1', 'x2'), fun, y = 'y' , wherex=.SD, wherey=.SD)]
 #              V1         V2
 #  1: -0.27871200  1.1943170
 #  2: -0.68843421 -1.5719016
 #  3:  1.06968681  2.8358612
 #  4:  0.21201412  1.0127712
 #  5:  0.05392450  0.2487873
 #  6:  0.04473767 -0.1644542
 #  7:  5.37851536  2.9710708
 #  8:  0.23653388  0.9506559
 #  9:  1.96364756 -1.4662968
 # 10: -0.02458077 -0.1197023

Note that you don't really need to wrap this in [.data.table

results <- setDT(lapply(c('x1','x2'), fun, y='y', wherex=DT,wherey=DT))

will return the same results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM