简体   繁体   中英

In R, aggregate statistics with data.table when the group by variable is a string

I got this data.table set :

set.seed(1234)
r=100
ref=c("banana","chocolate","apple")
ref2=c("florida","california")
products=sample(ref,r,TRUE)
cities=sample(ref2,r,TRUE)
quantity=sample(seq(1:5),100,TRUE)
dt=data.table(products,quantity,cities)
head(dt,5)

  products quantity     cities
1:    apple        3    florida
2:   banana        5    florida
3:   banana        3    florida
4:    apple        2 california
5:    apple        5    Florida

I want a function which take the index of the group by variable as an argument. Like this :

sumfn=function(dt,index){
  var_group=c("products","cities")[index]
  res=dt[,list(Q=sum(quantity)),by=list(var_group)]
  return(res)
}

So that, if I ran this command :

count_dt=sumfn(dt,1)

My output would be :

    products   Q
1: chocolate  72
2:    banana  91
3:     apple 136

However, when I do run the command above, instead, I get this error message :

Error in `[.data.table`(dt, , list(Q = sum(quantity)), by = list(var_group)) : 
  The items in the 'by' or 'keyby' list are length (1). Each must be same length as rows in x or number of rows returned by i (100).

Anyone know how to fix that? I'm stuck with this problem for an hour.

Remove list in the by clause, ie,

sumfn <- function(dt, index) {
    var_group <- c("products", "cities")[index]
    res <- dt[, list(Q = sum(quantity)), by = var_group]
    return(res)
}

and you will get

sumfn(dt,1)
    products   Q
1:     apple 116
2:    banana  83
3: chocolate 102

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM