简体   繁体   中英

Conditional sum in rolling window

I am quite new to R, so apologies in advance if I state something wrong :)

I have a dataframe consisting of 395 rows and 4973 columns, sorted by months, with number of occurrence per month (ranges from 0 to eg 25) for a lot of companies. The number of occurrence was summarised from daily data grouped by month and year. My dataframe df looks something like that (only a few months and 3 companies):

Date     FirmA FirmB FirmC
01-2015  20    NA    20
02-2015  21    2     1
03-2015  22    3     2
04-2015  24    7     5
05-2015  10    10    10
06-2015  9     20    2
07-2015  13    22    1
08-2015  20    19    1

I have now the task to sum up the occurences per company by a three month rolling window from months t-3 to t-1 (the 3 previous months). However, the sum should have following conditions. It should have at least 10 occurrences during the three month window and at least 3 occurrences in month t-1. It doesn't matter if an NA is in t-3 and/or t-2, as long as the two conditions are met.

It should look like that.

Date     FirmA FirmB FirmC
01-2015  NA    NA    NA
02-2015  20    NA    20
03-2015  41    NA    NA
04-2015  63    NA    NA
05-2015  67    12    NA
06-2015  56    20    17
07-2015  43    37    NA
08-2015  32    52    NA

I have no clue, how to approach that, especially the combination of rolling window/sum (probably something with lag) and the conditions regarding which numbers to use and which not.

Here's a method that uses zoo::rollapply :

df <- structure(list(Date = c("01-2015", "02-2015", "03-2015", "04-2015", 
"05-2015", "06-2015", "07-2015", "08-2015"), FirmA = c(20L, 21L, 
22L, 24L, 10L, 9L, 13L, 20L), FirmB = c(NA, 2L, 3L, 7L, 10L, 
20L, 22L, 19L), FirmC = c(20L, 1L, 2L, 5L, 10L, 2L, 1L, 1L)), .Names = c("Date", 
"FirmA", "FirmB", "FirmC"), class = "data.frame", row.names = c(NA, 
-8L))

library(zoo)

mysum <- function(x, minprev = 3) {
  l <- length(x)
  if (l==1 || (! is.na(x[l-1]) && x[l-1] >= minprev)) sum(x[-l], na.rm = TRUE) else NA
}

winsize <- 3
# conditionally-sum
df[-1] <- lapply(df[-1], function(z) rollapply(z, winsize + 1, mysum, partial = TRUE, align = "right"))
# remove those that are insufficient in total
df[-1] <- lapply(df[-1], function(z) ifelse(z <= 10, NA, z))
df
#      Date FirmA FirmB FirmC
# 1 01-2015    NA    NA    NA
# 2 02-2015    20    NA    20
# 3 03-2015    41    NA    NA
# 4 04-2015    63    NA    NA
# 5 05-2015    67    12    NA
# 6 06-2015    56    20    17
# 7 07-2015    43    37    NA
# 8 08-2015    32    52    NA

There may be a way to not require mysum , but two things make it slightly tricky: (1) the resulting sum goes in the next field (more easily side-stepped if the window is always length 3), and (2) the conditional on the last value. It's certainly feasible to try to smooth it out, but this works well enough.

Another approach, similar in concept to r2evans', is to compute the rolling sum through cumsum (after replacing NA s with 0 s) and insert NA s when conditions are not met:

ff = function(x, w = 3, ntot = 10, nlast = 3)
{
    x[is.na(x)] = 0L
    x = c(0L, x[-length(x)])

    cs = cumsum(x)
    wcs = cs - c(numeric(w), cs[1:(length(x) - w)])

    wcs[!((wcs >= ntot) & (x >= nlast))] = NA
    return(wcs)

}

sapply(df[-1], ff)  # 'df' borrowed from r2evans' answer
#     FirmA FirmB FirmC
#[1,]    NA    NA    NA
#[2,]    20    NA    20
#[3,]    41    NA    NA
#[4,]    63    NA    NA
#[5,]    67    12    NA
#[6,]    56    20    17
#[7,]    43    37    NA
#[8,]    32    52    NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM