简体   繁体   中英

Use "group_by" and "summarise" on some columns and keeping additional columns in the data frame

I am trying to use the group_by in R, then summarise while keeping extra columns in the data.

I just want to group by id_trayecto , but I want to include the other columns that are inside the group_by . The thing is that I don't know how to include them, without them being inside the group_by .

This is my code:

library(dplyr)

prueba <- reservas %>%
    group_by(id_trayecto, id_trayecto_dia, Van, fecha, hora) %>%
    summarize(Tickets.Vendidos = n(),
              Revenue = sum(Costo.final))

Thanks in advance guys:))

The answer really depends on the cardinality between id_trayecto and the other columns in reservas that you want to keep.

To simplify, let's say you only want to keep id_trayecto and fecha after summarizing. If reservas in general can contain multiple values for fecha for a given value of id_trayecto , then what you want to do doesn't make sense, since summarizing by id_trayecto would possibly need to summarize across multiple values of fecha , so you'd either have to leave fecha out, or include it in the summarize statement with an appropriate aggregation function.

If, however, in reservas you only ever get the same value of fecha for a given value of id_trayecto than you can just include fecha in the group_by statement without it changing the results of the values Tickets.Vendidos and Revenue . Or in other words: Grouping and summarizing by the more granular variable is equivalent to grouping and summarizing by both variables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM