I am trying to use the group_by
in R, then summarise
while keeping extra columns in the data.
I just want to group by id_trayecto
, but I want to include the other columns that are inside the group_by
. The thing is that I don't know how to include them, without them being inside the group_by
.
This is my code:
library(dplyr)
prueba <- reservas %>%
group_by(id_trayecto, id_trayecto_dia, Van, fecha, hora) %>%
summarize(Tickets.Vendidos = n(),
Revenue = sum(Costo.final))
Thanks in advance guys:))
The answer really depends on the cardinality between id_trayecto
and the other columns in reservas
that you want to keep.
To simplify, let's say you only want to keep id_trayecto
and fecha
after summarizing. If reservas
in general can contain multiple values for fecha
for a given value of id_trayecto
, then what you want to do doesn't make sense, since summarizing by id_trayecto
would possibly need to summarize across multiple values of fecha
, so you'd either have to leave fecha
out, or include it in the summarize
statement with an appropriate aggregation function.
If, however, in reservas
you only ever get the same value of fecha
for a given value of id_trayecto
than you can just include fecha
in the group_by
statement without it changing the results of the values Tickets.Vendidos
and Revenue
. Or in other words: Grouping and summarizing by the more granular variable is equivalent to grouping and summarizing by both variables.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.