I've read many posts on passing column names to a data.table function, but I did not see a post dealing with passing multiple variables to "by". I commonly use code like this to calculate summary statistics by group.
# Data
library(data.table)
dt=mtcars
setDT(dt)
# Summary Stats Example
dt[cyl==4,.(Count=.N,
Mean=mean(hp),
Median=median(hp)),
by=.(am,vs)]
# am vs Count Mean Median
# 1: 1 1 7 80.571 66
# 2: 0 1 3 84.667 95
# 3: 1 0 1 91.000 91
I can't get the following function to work:
# Function
myFun <- function(df,i,j,by){
df[i==4,.(Count=.N,
Mean=mean(j),
Median=median(j)),
by=.(am,by)]
}
myFun(dt,i='cyl',j='hp',by='vs')
Note that I hard-coded "4" and "am" into the function for this example. get()
worked when only using 1 by grouping variable, but failed when multiple grouping variables are used. Guidance on how to properly use get/quote/eval/substitute/parse/as.name/etc when writing data.table functions is appreciated.
Just create a character vector for by
part of data.table
, it will work:
myFun <- function(df, i, j, by){
df[get(i) == 4, .(Count = .N,
Mean = mean(get(j)),
Median = median(get(j))),
by = c(by, 'am')]
}
myFun(dt, i = 'cyl', j = 'hp', by = 'vs')
#vs am Count Mean Median
#1: 1 1 7 80.57143 66
#2: 1 0 3 84.66667 95
#3: 0 1 1 91.00000 91
I've accepted sm95's answer. Below is a more complex example/solution that sends a list to the by
argument:
# Libraries
library(data.table)
# Data
dt = mtcars
setDT(dt)
# Function to calculate summary statistics
myFun <- function(df, i1var, i1val, i2var, i2val, # i arguments
j, # j arguments
by1var, by2var, by2val){ # by arguments
df[get(i1var) == i1val & get(i2var) %in% i2val,
.(Count = .N,
Mean = mean(get(j)),
Median = median(get(j))),
by = .(get(by1var), get(by2var) == by2val)]
} # END Function
# Run function
myFun(dt,i1var = 'cyl', i1val = 4, i2var = 'gear', i2val = c(3,4),
j = 'hp',
by1var = 'vs', by2var = 'am', by2val = 1)
# vs am Count Mean Median
# 1: 1 1 6 75.16667 66
# 2: 1 0 3 84.66667 95
# Should match
dt[cyl == 4 & gear %in% c(3,4),
.(Count = .N,
Mean = mean(hp),
Median = median(hp)),
by = .(vs, am == 1)]
# vs am Count Mean Median
# 1: 1 1 6 75.16667 66
# 2: 1 0 3 84.66667 95
Here is my Cheat Sheet:
i
, j
, and by
variables using get(var)
i
or by
criteria directlyThe above may not apply to more complex functions, and may not be optimal.
If by
is a vector and NOT a list (eg, by=c()
vs by=.()
), then by
arguments can be passed directly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.