let's say I have a dataframe df
with three columns: revenue
(int), quarter
(factor with 4 levels), and product
(factor with 3 levels).
df <- data.frame(
revenue = sample(500:5000, 10, replace=TRUE),
quarter = sample(c("q1", "q2", "q3", "q4"), 50, replace = TRUE),
product = sample(c("book", "movie", "tv"), 50, replace = TRUE))
It would be very easy to use tapply
to group by either quarter or product and perform a variety of functions on revenue, like this:
quarterly_revenue <- tapply(df$revenue, df$quarter, sum)
which gives me the sum of revenue per quarter.
However, this is my question: what if I want it more granular, ie: the sum of each product's revenue per quarter? I've tried the split
function to create a list of dataframes and use various plyr
solutions, but none give me the output I'm looking for. I know I could subset based on each factor, but that seems inefficient, particularly when the actual set I'm working with has many more factor levels.
any ideas? thanks for the help!
We place the grouping columns in a list
and get the sum
tapply(df$revenue, list(df$quarter, df$product), sum)
It would be much easier with aggregate
aggregate(revenue~., df, sum)
or dplyr
or data.table
library(dplyr)
df %>%
group_by(quarter, product) %>%
summarise(Sum = sum(revenue))
You can use data.table
with a by
parameter:
library( data.table )
setDT( df )[ , quarterly_revenue := sum( revenue ),
by = .( quarter, product ) ]
Or, to summarise (instead of just adding a column):
library( data.table )
library( magrittr )
setDT( df )[ , sum( revenue ),
by = .( quarter, product ) ] %>%
setnames( c( "quarter", "product", "quarterly_revenue" ) )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.