简体   繁体   中英

(dplyr) Sum of N values most recent to a date

I'm trying to create a function that sums the closest n values to a given date. So if I had 5 weeks of data, and n=2, the value on week 1 would be the sum of weeks 2&3, the value on week 2 would be the sum of weeks 1&3, etc. Example:

library(dplyr)
library(data.table)

Week <- 1:5
Sales <- c(1, 3, 5, 7, 9)

frame <- data.table(Week, Sales)
frame
   Week Sales  Recent
1:    1     1    8
2:    2     3    6
3:    3     5    10
4:    4     7    14
5:    5     9    12

I want to make a function that does this for me with an input for most recent n (not just 2), but for now I want to get 2 right. Here's my function using lag/lead:

RecentSum = function(Variable, Lags){
  Sum = 0
  for(i in 1:(Lags/2)){ #Lags/2 because I want half values before and half after
    #Check to see if you can go backwards. If not, go foward (i.e. use lead).
    if(is.na(lag(Variable, i))){
      LoopSum = lead(Variable, i)
    }
    else{
      LoopSum = lag(Variable, i)
    }
    Sum = Sum + LoopSum
  }
  for(i in 1:(Lags/2)){
    if(is.na(lead(Variable, i))){ #Check to see if you can go forward. If not, go backwards (i.e. use lag).
      LoopSum = lag(Variable, i)
    }
    else{
      LoopSum = lead(Variable, i)
    }
    Sum = Sum + LoopSum
  }
  Sum
}

When I do RecentSum(frame$Sale,2) I get [1] 6 10 14 18 NA which is wrong for a number of reasons:

  1. My if statements are only hitting on week one, so it will always be NA for lag and always be non-NA for lead.
  2. I need to have a way to see if it uses lag/lead the first time. The first value is 6 instead of 8 because the first for-loop sends it to lead(_,1), but then the second for-loop does the same. I can't think of how I'd make my second for-loop recognize this.

Is there a function or library ( Zoo ?) that makes this task easy? I'd like to get my own function to work for the sake of practice/understanding, but at this point I'd rather just get it done.

Thanks!

To elaborate on my comment, lead and lag are functions that are meant to be used within vectorized functions such as dplyr . Here is a way to do it within dplyr without using a function:

df <- tibble(week = Week, sales = Sales)

df %>%
  mutate(recent = case_when(is.na(lag(sales)) ~ lead(sales, n = 1) + lead(sales, n = 2),
                            is.na(lead(sales)) ~ lag(sales, n = 1) + lag(sales, n = 2),
                            TRUE ~ lag(sales) + lead(sales)))

That gives you this:

# A tibble: 5 x 3
   week sales recent
   <int> <dbl>  <dbl>
1     1     1      8
2     2     3      6
3     3     5     10
4     4     7     14
5     5     9     12

1) Assuming that k is even define to as a vector of indices such that for each element of to we sum the k+1 elements of Sales that end in that index and from that subtract Sales :

k <- 2  # number of elements to sum
n <- nrow(frame)
to <- pmax(k+1, pmin(1:n + k/2, n))
Sum <- function(to, Sales) sum(Sales[seq(to = to, length = k+1)])
frame %>% mutate(recent = sapply(to, Sum, Sales) - Sales)

giving:

  Week Sales recent
1    1     1      8
2    2     3      6
3    3     5     10
4    4     7     14
5    5     9     12    

Note that by replacing the last line of code above with the following line the solution can be done entirely in base R:

transform(frame, recent = sapply(to, Sum, Sales) - Sales)

2) This concatenates the appropriate elements before and after the Sales series so that an ordinary rolling sum gives the result.

library(zoo)

ix <- c(seq(to = k+1, length = k/2), 1:n, seq(to = n-k, length = k/2))
frame %>% mutate(recent = rollsum(Sales[ix], k+1) - Sales)

Note that if k=2 then it reduces this to this one-liner:

frame %>% mutate(recent = rollsum(Sales[c(3, 1:n(), n()-2)], 3) - Sales)

giving:

  Week Sales recent
1    1     1      8
2    2     3      6
3    3     5     10
4    4     7     14
5    5     9     12

Update: fixed for k > 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM