简体   繁体   中英

R: data.table .dynamic aggregations on column Date columns

I am trying to do a min / max aggregate on a dynamically chosen column in a data.table . It works perfectly for numeric columns but I cannot get it to work on Date columns unless I create a temporary data.table .

It works when I use the name:

dt <- data.table(Index=1:31, Date = seq(as.Date('2015-01-01'), as.Date('2015-01-31'), by='days'))
dt[, .(minValue = min(Date), maxValue = max(Date))]
# minValue   maxValue
# 1: 2015-01-01 2015-01-31

It does not work when I use with=FALSE :

colName = 'Date'
dt[, .(minValue = min(colName), maxValue = max(colName)), with=F]
# Error in `[.data.table`(dt, , .(minValue = min(colName), maxValue = max(colName)),  : 
# could not find function "."

I can use .SDcols on a numeric column:

colName = 'Index'
dt[, .(minValue = min(.SD), maxValue = max(.SD)), .SDcols=colName]
#   minValue maxValue
#  1:        1       31

But I get an error when I do the same thing for a Date column:

colName = 'Date'
dt[, .(minValue = min(.SD), maxValue = max(.SD)), .SDcols=colName]
# Error in FUN(X[[i]], ...) : 
#   only defined on a data frame with all numeric variables

If I use lapply(.SD, min) or sapply() then the dates are changed to numbers.

The following works and does not seem to waste memory and is fast. Is there anything better?

a <- dt[, colName, with=F]
setnames(a, 'a')
a[, .(minValue = min(a), maxValue = max(a))]

On your first attempt:

dt[, .(minValue = min(colName), maxValue = max(colName)), with=F]
# Error in `[.data.table`(dt, , .(minValue = min(colName), maxValue = max(colName)),  : 
# could not find function "."

You should simply read the Introduction to data.table vignette to understand what with= means. It's easier if you're aware of with() function from base R.

On the second one:

dt[, .(minValue = min(.SD), maxValue = max(.SD)), .SDcols=colName]
# Error in FUN(X[[i]], ...) : 
#   only defined on a data frame with all numeric variables

This seems like an issue with min() and max() on a data.frame/data.table with column with attributes. Here's a MRE.

df = data.frame(x=as.Date("2015-01-01"))
min(df)
# Error in FUN(X[[i]], ...) : 
#   only defined on a data frame with all numeric variables

To answer your question, you can use get() :

dt[, .(min = min(get(colName)), max = max(get(colName)))]

Or as @Frank suggested, [[ operator to subset the column:

dt[, .(min = min(.SD[[colName]]), max = max(.SD[[colName]]))]

There's not yet a nicer way of applying .SD to multiple functions (because base R doesn't seem to have one AFAICT, and data.table tries to use base R functions as much as possible). There's a FR #1063 to address this issue. If/when that gets implemented, then one could do, for example:

# NOTE: not yet implemented, FR #1063
dt[, colwise(.SD, min, max), .SDcols = colName]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM