I have the following data frame. I want to calculate the weighted average to date for each of the weeks.
Existing data frame:
> df
week Avg_price Num_items
1 100 10
2 120 8
3 90 5
4 110 20
Desired data frame:
> df
week Avg_price Num_items Avg_price_toDate
1 100 10 100
2 120 8 108.8
3 90 5 104.78
4 110 20 107.21
I've figured out how to do it using a basic for loop calculating the cumulative number of items to date and the previous Average_price_toDate. I'm wondering if there is a better way to do it in R, since I would like to be able to segment the data-frame based on different product groupings as well.
Yes, you can use cumsum
to compute rolling weighted averages as well.
transform(df,Avg_price_toDate=cumsum(Avg_price*Num_items)/cumsum(Num_items))
week Avg_price Num_items Avg_price_toDate 1 1 100 10 100.0000 2 2 120 8 108.8889 3 3 90 5 104.7826 4 4 110 20 107.2093
Here is a more general solution with data.table
that can handle categories.
library(data.table)
dt <- data.table(category = c(rep("a", 4), rep("b", 4)),
week = c(1, 2, 3, 4,
1, 2, 3, 4),
Avg_price = c(100, 120, 90, 110,
150, 200, 250, 300),
Num_items = c( 10, 8, 5, 20,
20, 30, 40, 50))
(dt[, wtd:=cumsum(Avg_price*Num_items)/cumsum(Num_items),
by = "category"])
which gives this:
category week Avg_price Num_items wtd
1: a 1 100 10 100.0000
2: a 2 120 8 108.8889
3: a 3 90 5 104.7826
4: a 4 110 20 107.2093
5: b 1 150 20 150.0000
6: b 2 200 30 180.0000
7: b 3 250 40 211.1111
8: b 4 300 50 242.8571
Using dplyr
:
library(dplyr)
df %>% mutate(Avg_price_toDate = cumsum(Avg_price*Num_items)/cumsum(Num_items))
Using sqldf
:
library(sqldf)
sqldf('SELECT a.*, SUM(b.Avg_price*b.Num_items*1.0)/SUM(b.Num_items) AS Avg_price_toDate
FROM df AS a, df AS b WHERE b.week <= a.week
GROUP BY a.week')
Output:
week Avg_price Num_items Avg_price_toDate
1 1 100 10 100.0000
2 2 120 8 108.8889
3 3 90 5 104.7826
4 4 110 20 107.2093
Data:
df <- structure(list(week = 1:4, Avg_price = c(100L, 120L, 90L, 110L
), Num_items = c(10L, 8L, 5L, 20L)), .Names = c("week", "Avg_price",
"Num_items"), class = "data.frame", row.names = c(NA, -4L))
Combining everyone's answers, here is my implementation for calculating the average to date grouped by different categories.
Data frame:
week Avg_price Num_items type_item
1 1 100 10 1
2 2 120 8 1
3 3 90 5 2
4 4 110 20 2
Using dplyr
:
df %>%
group_by(type_item) %>%
mutate(avg.price.by.type = cumsum(Avg_price * Num_items) / cumsum(Num_items))
Output:
week Avg_price Num_items type_item avg.price.by.type
1 1 100 10 1 100.0000
2 2 120 8 1 108.8889
3 3 90 5 2 90.0000
4 4 110 20 2 106.0000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.