简体   繁体   English

R data.table用户定义函数

[英]R data.table user defined function

I am transitioning from using data.frame in R to data.table for better performance. 我正在从在R中使用data.frame过渡到data.table,以获得更好的性能。 One of the main segments in converting code was applying custom functions from apply on data.frame to using it in data.table. 转换代码的主要步骤之一是应用自定义功能,从在data.frame上应用到在data.table中使用它。

Say I have a simple data table, dt1. 假设我有一个简单的数据表dt1。

x y z---header

1 9 j

4 1 n

7 1 n

Am trying to calculate another new column in dt1, based on values of x,y,z I tried 2 ways, both of them give the correct result, but the faster one spits out a warning. 我试图根据x,y,z的值计算dt1中的另一新列,我尝试了两种方法,两种方法都能给出正确的结果,但是更快的方法会发出警告。 So want to make sure the warning is nothing serious before I use the faster version in converting my existing code. 因此,在使用更快的版本转换现有代码之前,请确保警告没有严重的意义。

(1) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}]

(2) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}, by = 1:nrow(x)]

Version 1 runs faster than version 2, but spits out a warning" the condition has length > 1 and only the first element will be used" But the result is good. 版本1的运行速度比版本2快,但发出警告“条件的长度> 1,并且仅将使用第一个元素”,但结果很好。 The second version is slightly slower but doesn't give that warning. 第二个版本稍慢一些,但没有发出警告。 I wanted to make sure version one doesn't give erratic results once I start writing complicated functions. 我想确保一旦开始编写复杂的函数,版本一就不会产生不稳定的结果。

Please treat the question as a generic one with the view to run a user defined function which wants to access different column values in a given row and calculate the new column value for that row. 请将该问题视为通用问题,以运行用户定义的函数的视图,该函数要访问给定行中的不同列值并计算该行的新列值。

Thanks for your help. 谢谢你的帮助。

If 'x', 'y', and 'z' are the columns of 'dt1', try either the vectorized ifelse 如果“ x”,“ y”和“ z”是“ dt1”的列,请尝试使用向量化ifelse

dt1[, a:=ifelse(x<1 & y >3 & z=='n', 6, 7)] 

Or create 'a' with 7, then assign 6 to 'a' based on the logical index. 或用7创建'a',然后根据逻辑索引将6分配给'a'。

dt1[, a := 7][x<1 & y >3 & z=='n', a:=6][]

Using a function 使用功能

getnewvariable <- function(v1, v2, v3){
   ifelse(v1 <1 & v2 >3 & v3=='n', 6, 7)
}

 dt1[, a:=getnewvariable(x,y,z)][]

data 数据

df1 <- structure(list(x = c(0L, 1L, 4L, 7L, -2L), y = c(4L, 9L, 1L, 
1L, 5L), z = c("n", "j", "n", "n", "n")), .Names = c("x", "y", 
"z"), class = "data.frame", row.names = c(NA, -5L))

dt1 <- as.data.table(df1) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM