简体   繁体   中英

Calculate average to date in r

I have the following data frame. I want to calculate the weighted average to date for each of the weeks.

Existing data frame:

> df
  week Avg_price Num_items
    1       100        10
    2       120         8
    3        90         5
    4       110        20

Desired data frame:

> df
  week Avg_price Num_items  Avg_price_toDate
    1       100        10                100
    2       120         8                108.8
    3        90         5                104.78
    4       110        20                107.21

I've figured out how to do it using a basic for loop calculating the cumulative number of items to date and the previous Average_price_toDate. I'm wondering if there is a better way to do it in R, since I would like to be able to segment the data-frame based on different product groupings as well.

Yes, you can use cumsum to compute rolling weighted averages as well.

transform(df,Avg_price_toDate=cumsum(Avg_price*Num_items)/cumsum(Num_items))
week Avg_price Num_items Avg_price_toDate
1    1       100        10         100.0000
2    2       120         8         108.8889
3    3        90         5         104.7826
4    4       110        20         107.2093

Here is a more general solution with data.table that can handle categories.

library(data.table)
dt <- data.table(category = c(rep("a", 4), rep("b", 4)),
                 week = c(1, 2, 3, 4,
                          1, 2, 3, 4),
                 Avg_price = c(100, 120,  90, 110,
                               150, 200, 250, 300),
                 Num_items = c( 10,   8,   5,  20,
                                20,  30,  40, 50))
(dt[, wtd:=cumsum(Avg_price*Num_items)/cumsum(Num_items), 
      by = "category"])

which gives this:

   category week Avg_price Num_items      wtd
1:        a    1       100        10 100.0000
2:        a    2       120         8 108.8889
3:        a    3        90         5 104.7826
4:        a    4       110        20 107.2093
5:        b    1       150        20 150.0000
6:        b    2       200        30 180.0000
7:        b    3       250        40 211.1111
8:        b    4       300        50 242.8571

Using dplyr :

library(dplyr)
df %>% mutate(Avg_price_toDate = cumsum(Avg_price*Num_items)/cumsum(Num_items))

Using sqldf :

library(sqldf)
sqldf('SELECT a.*, SUM(b.Avg_price*b.Num_items*1.0)/SUM(b.Num_items) AS Avg_price_toDate
      FROM df AS a, df AS b WHERE b.week <= a.week
      GROUP BY a.week')

Output:

  week Avg_price Num_items Avg_price_toDate
1    1       100        10         100.0000
2    2       120         8         108.8889
3    3        90         5         104.7826
4    4       110        20         107.2093

Data:

df <- structure(list(week = 1:4, Avg_price = c(100L, 120L, 90L, 110L
), Num_items = c(10L, 8L, 5L, 20L)), .Names = c("week", "Avg_price", 
"Num_items"), class = "data.frame", row.names = c(NA, -4L))

Combining everyone's answers, here is my implementation for calculating the average to date grouped by different categories.

Data frame:

  week Avg_price Num_items type_item
1    1       100        10         1
2    2       120         8         1
3    3        90         5         2
4    4       110        20         2

Using dplyr :

df %>%
  group_by(type_item) %>%
  mutate(avg.price.by.type = cumsum(Avg_price * Num_items) / cumsum(Num_items))

Output:

  week Avg_price Num_items type_item avg.price.by.type
1    1       100        10         1          100.0000
2    2       120         8         1          108.8889
3    3        90         5         2           90.0000
4    4       110        20         2          106.0000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM