简体   繁体   中英

functions on groups within groups R

let's say I have a dataframe df with three columns: revenue (int), quarter (factor with 4 levels), and product (factor with 3 levels).

df <- data.frame(
     revenue = sample(500:5000, 10, replace=TRUE),
     quarter = sample(c("q1", "q2", "q3", "q4"), 50, replace = TRUE),
     product = sample(c("book", "movie", "tv"), 50, replace = TRUE))

It would be very easy to use tapply to group by either quarter or product and perform a variety of functions on revenue, like this:

quarterly_revenue <- tapply(df$revenue, df$quarter, sum)

which gives me the sum of revenue per quarter.

However, this is my question: what if I want it more granular, ie: the sum of each product's revenue per quarter? I've tried the split function to create a list of dataframes and use various plyr solutions, but none give me the output I'm looking for. I know I could subset based on each factor, but that seems inefficient, particularly when the actual set I'm working with has many more factor levels.

any ideas? thanks for the help!

We place the grouping columns in a list and get the sum

tapply(df$revenue, list(df$quarter, df$product),  sum)

It would be much easier with aggregate

aggregate(revenue~., df, sum)

or dplyr or data.table

library(dplyr)
df %>% 
    group_by(quarter, product) %>%
    summarise(Sum = sum(revenue))

You can use data.table with a by parameter:

library( data.table )
setDT( df )[ , quarterly_revenue := sum( revenue ), 
               by = .( quarter, product ) ] 

Or, to summarise (instead of just adding a column):

library( data.table )
library( magrittr )

setDT( df )[ , sum( revenue ), 
               by = .( quarter, product ) ] %>%
    setnames( c( "quarter", "product", "quarterly_revenue" ) )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM