I have a dataset with prices of items from different branches of a store that looks a bit like this:
Item,Chain,Branch1,Branch2,Branch3
Laptop,Sears,1000,1100,900
Laptop,JCP,1300,900,1200
Laptop,Macys,1500,1800,1700
TV,Sears,800,600,700
TV,JCP,400,600,700
TV,Macys,900,1000,1100
What I want: For each unique combination of Item and Chain, calculate the median price from the three branches.
I tried something along the lines of
aggregate(data[,3:5], list(data$Item, data$Chain), median)
But it didn't work. Any ideas on how I can solve this problem?
You can use group_by()
and summarise()
:
library(dplyr)
df <- data_frame(Item = c("Laptop","Laptop","Laptop","TV","TV","TV"),
Chain = c("Sears","JCP","Macys","Sears","JCP","Macys"),
Branch1 = c(1000,1300,1500,800,400,900),
Branch2 = c(1100,900,1800,600,600,1000),
Branch3 = c(900,1200,1700,700,700,1100))
df %>%
group_by(Item, Chain) %>%
summarise(median = median(c(Branch1, Branch2, Branch3)))
The issue is that aggregate()
aggregates each column .
For the sake of completeness, here are some alternative approaches:
apply()
dat$median <- apply(dat[, 3:5], 1L, median)
dat
Item Chain Branch1 Branch2 Branch3 median 1: Laptop Sears 1000 1100 900 1000 2: Laptop JCP 1300 900 1200 1200 3: Laptop Macys 1500 1800 1700 1700 4: TV Sears 800 600 700 700 5: TV JCP 400 600 700 600 6: TV Macys 900 1000 1100 1000
data.table
library(data.table)
setDT(dat)[, .(median = median(c(Branch1, Branch2, Branch3))), by = .(Item, Chain)]
Item Chain median 1: Laptop Sears 1000 2: Laptop JCP 1200 3: Laptop Macys 1700 4: TV Sears 700 5: TV JCP 600 6: TV Macys 1000
data.table
after reshaping to long format Following neilfws' suggestion to reshape from wide to long format before aggregating:
library(data.table)
melt(setDT(dat), c("Item", "Chain"))[, .(median = median(value)), by = .(Item, Chain)]
Item Chain median 1: Laptop Sears 1000 2: Laptop JCP 1200 3: Laptop Macys 1700 4: TV Sears 700 5: TV JCP 600 6: TV Macys 1000
As data
and df
are names of R functions I will use a different name to avoid the risk of hard to debug name clashes:
dat <- data.table::fread("
Item,Chain,Branch1,Branch2,Branch3
Laptop,Sears,1000,1100,900
Laptop,JCP,1300,900,1200
Laptop,Macys,1500,1800,1700
TV,Sears,800,600,700
TV,JCP,400,600,700
TV,Macys,900,1000,1100")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.