简体   繁体   English

R Data.table用用户定义的函数分配新列

[英]R Data.table assign new column with user defined function

I am concerned by the following example. 我对以下示例感到担忧。

library(data.table)

set.seed(1)
table1 <- data.table(a=sample(10,5,TRUE),b=sample(10,5,TRUE))
function1 <- function(a,b){
  a*b+runif(1)
}
table1[,c:=function1(a,b)]
table1[,d:=unlist(mapply(function1,a,b))]
set(table1,NULL,"e",unlist(mapply(function1,table1[,a],table1[,b])))
table1
    a  b         c         d         e
1:  3  9 27.205975 27.176557 27.717619
2:  4 10 40.205975 40.687023 40.991906
3:  6  7 42.205975 42.384104 42.380035
4: 10  7 70.205975 70.769841 70.777445
5:  3  1  3.205975  3.497699  3.934705

I would like to use the syntax I used to create the 'column c' but the number generated by runif(1) is always the same when I use that syntax. 我想使用创建'c列'时使用的语法,但是使用该语法时, runif(1)生成的数字始终相同。 I found 2 ways to solve the problem ('column d' and 'column e'), but I clearly prefer the syntax used for 'column c'. 我找到了解决问题的两种方法(“ column d”和“ column e”),但是我显然更喜欢用于“ column c”的语法。 Anybody has a solution for me? 有人对我有解决方案吗?

Thanks! 谢谢!

In your column c syntax, data.table actually sends two vectors to function1 , but runif(1) is just a number, which R converts to vector of the same value. 在您的c列语法中, data.table实际上将两个向量发送到function1 ,但是runif(1)只是一个数字,R将其转换为具有相同值的向量。 To avoid this situation pass the length of the vector to runif function (or runif(length(a)) as was suggested) 为了避免这种情况,将向量的长度传递给runif函数(或建议的runif(length(a))

function1 <- function(a,b, N){
  a*b+runif(N)
}
table1[,c:=function1(a,b, .N)]

Other option would be to evaluate function by row (which I suppose you have in mind) 另一种选择是按行评估功能(我想您已经想到了)

table1[, id:=.I][, function1(a,b), by = id]

But it's not very efficient 但这不是很有效

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM