简体   繁体   中英

R data.table Group By Created Column

I am new to the awesome data.table package and am running into an issue which hopefully has a simple solution. I want to filter a data.table , add some columns to that data.table and group by some columns in that data.table including one of the columns I created in my j clause.

If I were using dplyr , it would go something like this:

library(dplyr)

mtcars %>% 
    filter(vs == 1) %>% 
    mutate(trans = ifelse(am == 1, "Manual", "Auto")) %>% 
    group_by(gear, carb, trans) %>% 
    summarise(num_cars = n(),
              avg_qsec = mean(qsec))

# A tibble: 6 x 5
# Groups:   gear, carb [?]
   gear  carb trans  num_cars avg_qsec
  <dbl> <dbl> <chr>     <int>    <dbl>
1     3     1 Auto          3     19.9
2     4     1 Manual        4     19.2
3     4     2 Auto          2     21.4
4     4     2 Manual        2     18.6
5     4     4 Auto          2     18.6
6     5     2 Manual        1     16.9

My attempt with data.table doesn't work.

library(data.table)

dtmt <- as.data.table(mtcars)

dtmt[vs == 1, 
     .(num_cars = .N, 
       avg_qsec = mean(qsec), 
       trans = ifelse(am == 1, 
                      "Manual", "Auto")),
     by = list(gear, carb, trans)]


Error in eval(bysub, xss, parent.frame()) : object 'trans' not found

So the column I make in my j clause can't be used in the by ? It works fine if I don't try and transform the am column.

dtmt[vs == 1, 
     .(num_cars = .N, 
       avg_qsec = mean(qsec)),
     by = list(gear, carb, am)]

   gear carb am num_cars avg_qsec
1:    4    1  1        4    19.22
2:    3    1  0        3    19.89
3:    4    2  0        2    21.45
4:    4    4  0        2    18.60
5:    4    2  1        2    18.56
6:    5    2  1        1    16.90

Thanks!

We create a column 'trans' after filtering the rows where 'vs' is 1. Then, use that as grouping variable for summarization

dtmt[vs==1 # subset the rows
    ][, trans := c("Auto", "Manual")[(am==1)+1] # create trans
     ][, .(num_cars = .N, avg_qsec = mean(qsec)), by = .(gear, carb, trans)]

It is possible to do all the things in a single [] :

as.data.table(mtcars)[
    vs == 1,
    .(num_cars = .N, avg_qsec = mean(qsec)),
    by = .(gear, carb, trans = ifelse(am == 1, "Manual", "Auto"))]

#    gear carb  trans num_cars avg_qsec
# 1:    4    1 Manual        4    19.22
# 2:    3    1   Auto        3    19.89
# 3:    4    2   Auto        2    21.45
# 4:    4    4   Auto        2    18.60
# 5:    4    2 Manual        2    18.56
# 6:    5    2 Manual        1    16.90

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM