简体   繁体   中英

Aggregate on two columns

I have a dataset with prices of items from different branches of a store that looks a bit like this:

Item,Chain,Branch1,Branch2,Branch3
Laptop,Sears,1000,1100,900
Laptop,JCP,1300,900,1200
Laptop,Macys,1500,1800,1700
TV,Sears,800,600,700
TV,JCP,400,600,700
TV,Macys,900,1000,1100

What I want: For each unique combination of Item and Chain, calculate the median price from the three branches.

I tried something along the lines of

aggregate(data[,3:5], list(data$Item, data$Chain), median)

But it didn't work. Any ideas on how I can solve this problem?

You can use group_by() and summarise() :

library(dplyr)

df <- data_frame(Item = c("Laptop","Laptop","Laptop","TV","TV","TV"),
                 Chain = c("Sears","JCP","Macys","Sears","JCP","Macys"),
                 Branch1 = c(1000,1300,1500,800,400,900),
                 Branch2 = c(1100,900,1800,600,600,1000),
                 Branch3 = c(900,1200,1700,700,700,1100))

df %>%
  group_by(Item, Chain) %>%
  summarise(median = median(c(Branch1, Branch2, Branch3)))

The issue is that aggregate() aggregates each column .

For the sake of completeness, here are some alternative approaches:

1. Base R row-wise apply()

dat$median <- apply(dat[, 3:5], 1L, median)
dat
  Item Chain Branch1 Branch2 Branch3 median 1: Laptop Sears 1000 1100 900 1000 2: Laptop JCP 1300 900 1200 1200 3: Laptop Macys 1500 1800 1700 1700 4: TV Sears 800 600 700 700 5: TV JCP 400 600 700 600 6: TV Macys 900 1000 1100 1000 

2. data.table

library(data.table)
setDT(dat)[, .(median = median(c(Branch1, Branch2, Branch3))), by = .(Item, Chain)]
  Item Chain median 1: Laptop Sears 1000 2: Laptop JCP 1200 3: Laptop Macys 1700 4: TV Sears 700 5: TV JCP 600 6: TV Macys 1000 

3. data.table after reshaping to long format

Following neilfws' suggestion to reshape from wide to long format before aggregating:

library(data.table)
melt(setDT(dat), c("Item", "Chain"))[, .(median = median(value)), by = .(Item, Chain)]
  Item Chain median 1: Laptop Sears 1000 2: Laptop JCP 1200 3: Laptop Macys 1700 4: TV Sears 700 5: TV JCP 600 6: TV Macys 1000 

Data

As data and df are names of R functions I will use a different name to avoid the risk of hard to debug name clashes:

dat <- data.table::fread("
Item,Chain,Branch1,Branch2,Branch3
Laptop,Sears,1000,1100,900
Laptop,JCP,1300,900,1200
Laptop,Macys,1500,1800,1700
TV,Sears,800,600,700
TV,JCP,400,600,700
TV,Macys,900,1000,1100")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM