简体   繁体   English

R data.table在外部定义的函数上使用lapply

[英]R data.table using lapply on functions defined outside

This question is related to R - pass fixed columns to lapply function in data.table and weighted means by group and column , but somewhat different. 此问题与R-将固定列传递给data.table中的lapply函数有关,并且按组和列进行加权均值 ,但有所不同。

I would like to have one fixed column interacting with all other columns in a data.table . 我想要一个固定的列与data.table所有其他列进行data.table A trivial example to illustrate: 一个简单的例子来说明:

DT <- data.table(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
DT[, lapply(c('x1', 'x2'), function(x) get(x) * y)]

Now suppose that the operation is a lot more complicated than multiplication, such that I would like to define a standalone function outside the scope of the data.table : 现在假设该操作比乘法复杂得多,因此我想在data.table范围之外定义一个独立函数:

fun <- function(x) {
    return(get(x) * y)
}
DT[, lapply(c('x1', 'x2'), fun)]
Error in get(x) : object 'x1' not found

Obvious there is an issue with variable scope, since the function defined outside the data.table cannot see the variables inside. 显然,变量范围存在问题,因为在data.table外部定义的函数看不到内部的变量。 Is there any clever trick to define the function outside data.table and still be able to use lapply ? 是否有任何巧妙的技巧可以在data.table之外定义函数,并且仍然可以使用lapply

You will wrap your self in knots if you are trying to combine references by character string and named variables. 如果您尝试通过字符串和命名变量组合引用,则将自己打结。 (and also by refering to "global" variables within functions) (以及通过引用函数中的“全局”变量)

The easiest way forward would be to define where get is looking for x , (and y ) 最简单的方法是定义get在哪里寻找x (和y

Here is the function rewritten so that you can tell it where to look. 这是重写的函数,因此您可以告诉它在哪里看。

fun <- function(x,y,wherex=parent.frame(),wherey=parent.frame()) {
    return(get(x,wherex) * get(y,wherey))
}

data.table inspects the names present in j , and only loads in the required columns. data.table检查存在于j的名称,并且仅在必需的列中加载。

In your example, you don't use the column names, so nothing is available. 在您的示例中,您没有使用列名,因此没有可用的列。

If you include .SD in the expression for j , it will load in all columns. 如果在j的表达式中包括.SD ,它将在所有列中加载。 You can use .SD as the wherex / wherey arguments to the newly defined fun 您可以将.SD用作新定义的funwherex / wherey参数

DT[, lapply(c('x1', 'x2'), fun, y = 'y' , wherex=.SD, wherey=.SD)]
 #              V1         V2
 #  1: -0.27871200  1.1943170
 #  2: -0.68843421 -1.5719016
 #  3:  1.06968681  2.8358612
 #  4:  0.21201412  1.0127712
 #  5:  0.05392450  0.2487873
 #  6:  0.04473767 -0.1644542
 #  7:  5.37851536  2.9710708
 #  8:  0.23653388  0.9506559
 #  9:  1.96364756 -1.4662968
 # 10: -0.02458077 -0.1197023

Note that you don't really need to wrap this in [.data.table 请注意,您实际上并不需要将其包装在[.data.table

results <- setDT(lapply(c('x1','x2'), fun, y='y', wherex=DT,wherey=DT))

will return the same results. 将返回相同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM