简体   繁体   中英

Aggregate Multiple Columns with Different Formula in large data.tables using Data.table r

I have a very large data set with many columns. I need to aggregate this data and perform different functions on each set of columns. I have many columns so it is cumbersome to manually specify each function to be used with each column in the aggregation. I have read many post on how to aggregate with data.table but none seem to handle this situation.

Example

Lets work with the mpg dataset. I want to aggregate by cyl and trans . I want to return the average for the cty and hwy columns but want to return the first value from manufacturer and model . In reality I would apply to many more columns by specifying cols<-names(data)[10:50] and .SDcols=cols . I can process each set of cols separately and then merge the data afterwards, but even this gets messy when you have dozens of different column groups. I believe there must be a better way with data.table .

require("data.table")
require("ggplot2") #for the mpg dataset
dat1<-data.table(mpg)
perf<-dat1[, lapply(.SD, ave), .SDcols=c("cty","hwy"), by=list(cyl,trans)] #Aggregate performance data.
info<-dat1[, lapply(.SD, first), .SDcols=c("manufacturer","model"), by=list(cyl,trans)] #Aggregate model data.
perf[info, on=list(cyl,trans)] #Merge data

> head(perf[info, on=list(cyl,trans)])
   cyl      trans cty hwy manufacturer model
1:   4   auto(l5)   6   6         audi    a4
2:   4 manual(m5)  33  33         audi    a4
3:   4 manual(m6)   7   7         audi    a4
4:   4   auto(av)   2   2         audi    a4
5:   6   auto(l5)  16  16         audi    a4
6:   6 manual(m5)  18  18         audi    a4

Question

How can I perform the above in 1 operation? What I am thinking is to subset each group and then do something like this. Maybe a bit overambitious but I'm just shooting for the stars I guess.

dat1[, .(names(group1)=lapply(group1, ave),names(group2)=lapply(group2, first)), by=list(cyl,trans)]
group1 = c("cty", "hwy")
group2 = c("manufacturer", "model")

dat1[, c(
  lapply(.SD[, ..group1], mean), 
  lapply(.SD[, ..group2], first)
), by=.(cyl, trans)]

gives

    cyl      trans      cty      hwy manufacturer              model
 1:   4   auto(l5) 20.33333 31.00000         audi                 a4
 2:   4 manual(m5) 21.54545 29.27273         audi                 a4
 3:   4 manual(m6) 21.00000 29.57143         audi                 a4
 4:   4   auto(av) 22.00000 30.50000         audi                 a4
 5:   6   auto(l5) 15.18750 21.43750         audi                 a4
 6:   6 manual(m5) 16.66667 22.66667         audi                 a4
 7:   6   auto(av) 18.66667 26.00000         audi                 a4
 8:   4   auto(s6) 20.50000 28.25000         audi         a4 quattro
 9:   6   auto(s6) 17.40000 26.00000         audi         a4 quattro
10:   6 manual(m6) 16.00000 22.60000         audi         a4 quattro
11:   8   auto(s6) 13.60000 20.40000         audi         a6 quattro
12:   8   auto(l4) 12.20000 16.73333    chevrolet c1500 suburban 2wd
13:   8 manual(m6) 13.42857 20.00000    chevrolet           corvette
14:   4   auto(l4) 20.50000 27.62500    chevrolet             malibu
15:   6   auto(l4) 16.03448 22.68966    chevrolet             malibu
16:   4   auto(l3) 21.00000 27.00000        dodge        caravan 2wd
17:   6   auto(l6) 16.00000 23.00000        dodge        caravan 2wd
18:   8   auto(l5) 12.29412 16.41176        dodge  dakota pickup 4wd
19:   8 manual(m5) 13.00000 18.80000        dodge  dakota pickup 4wd
20:   8   auto(l6) 12.50000 18.50000         ford     expedition 2wd
21:   8   auto(s5) 12.00000 18.00000       nissan     pathfinder 4wd
22:   8   auto(s4) 16.00000 25.00000      pontiac         grand prix
23:   4   auto(s4) 20.00000 26.00000       subaru        impreza awd
24:   4   auto(s5) 22.00000 31.00000       toyota       camry solara
25:   6   auto(s5) 18.00000 27.00000       toyota       camry solara
26:   5   auto(s6) 20.50000 29.00000   volkswagen              jetta
27:   5 manual(m5) 20.50000 28.50000   volkswagen              jetta
    cyl      trans      cty      hwy manufacturer              model

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM