How to count the number of rows a condition has been met

Question

I have a dataframe df of stock liquidity ( dput statement below), and I need to measure how many of the previous consecutive three rows the value has been > 10:

  date     sec_id  liquidity   good       count.good.rows
2016-07-29   3277  9.142245 FALSE               0
2016-08-31   3277 11.070555  TRUE               0
2016-09-30   3277 11.934113  TRUE               1
2016-10-31   3277 12.192237  TRUE               2
2016-11-30   3277 10.165183  TRUE               3
2016-12-30   3277  8.414033 FALSE               3
2016-01-29   3426  6.494181 FALSE               0
2016-02-29   3426  8.216213 FALSE               0
2016-03-31   3426 10.081115  TRUE               0
2016-04-29   3426 10.119685  TRUE               1
2016-05-31   3426  8.659732 FALSE               2
2016-06-30   3426  6.790178 FALSE               1
2016-07-29   3426  7.234159 FALSE               0

Note a few things about the data:

There are multiple sec_id values, and I need to do this work on each sec_id value, based on the order of the data column.
I've already added the good column, but can't figure out how to do the count.good.rows column without explicitly using lag(...,1) + lag(...,2) + lag(...,3) . This would be a poor solution because I need the 3 to be a variable (I may end up wanting to look at previous 2 rows, or 4 rows).

Any ideas?

Here's my full dput :

df = structure(list(date = structure(c(16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165, 16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165), class = "Date"),
    sec_id = c(3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L),
    liquidity = c(4.014428, 3.779665, 4.833813, 5.244417, 7.150838, 7.639399, 9.142245, 11.070555, 11.934113, 12.192237, 10.165183, 8.414033, 6.494181, 8.216213, 10.081115, 10.119685, 8.659732, 6.790178, 7.234159, 8.529101, 9.015898, 8.307979, 8.231237, 8.711095),
    good = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
    class = "data.frame", .Names = c("date", "sec_id", "liquidity", "good"), 
    row.names = c(NA, -24L))

Answer 1

You could define a lagcounter function which substracts to the cumulative sum of good the cumulative sum n rows back:

lagcounter = function(x,n) {y = cumsum(x); lag(y-lag(y,n,default=0),default=0)}

And then using a dplyr syntax, use this newly defined function in a mutate statement, after having grouped by sec_id :

library(dplyr)
df %>% group_by(sec_id) %>% mutate(count.good.rows = lagcounter(good,3)) 

         date sec_id liquidity  good count.good.rows
1  2016-01-29   3277  4.014428 FALSE               0
2  2016-02-29   3277  3.779665 FALSE               0
3  2016-03-31   3277  4.833813 FALSE               0
4  2016-04-29   3277  5.244417 FALSE               0
5  2016-05-31   3277  7.150838 FALSE               0
6  2016-06-30   3277  7.639399 FALSE               0
7  2016-07-29   3277  9.142245 FALSE               0
8  2016-08-31   3277 11.070555  TRUE               0
9  2016-09-30   3277 11.934113  TRUE               1
10 2016-10-31   3277 12.192237  TRUE               2
11 2016-11-30   3277 10.165183  TRUE               3
12 2016-12-30   3277  8.414033 FALSE               3
13 2016-01-29   3426  6.494181 FALSE               0
14 2016-02-29   3426  8.216213 FALSE               0
15 2016-03-31   3426 10.081115  TRUE               0
16 2016-04-29   3426 10.119685  TRUE               1
17 2016-05-31   3426  8.659732 FALSE               2
18 2016-06-30   3426  6.790178 FALSE               2
19 2016-07-29   3426  7.234159 FALSE               1
20 2016-08-31   3426  8.529101 FALSE               0
21 2016-09-30   3426  9.015898 FALSE               0
22 2016-10-31   3426  8.307979 FALSE               0
23 2016-11-30   3426  8.231237 FALSE               0
24 2016-12-30   3426  8.711095 FALSE               0

Answer 2

Try zoo::rollapply

library(zoo)
df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(good, 3, sum, align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   # <fctr>      <int>     <dbl> <lgl>           <int>
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   1
 # 3 2016-09-30   3277     11.9  T                   2
 # 4 2016-10-31   3277     12.2  T                   3
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   2
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   1
# 10 2016-04-29   3426     10.1  T                   2
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   1
# 13 2016-07-29   3426      7.23 F                   0

Edit If you're only interested in counting the previous three rows

df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(dplyr::lag(good, 1), 3, function(i) sum(i, na.rm=TRUE), align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   # <fctr>      <int>     <dbl> <lgl>           <int>
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   0
 # 3 2016-09-30   3277     11.9  T                   1
 # 4 2016-10-31   3277     12.2  T                   2
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   3
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   0
# 10 2016-04-29   3426     10.1  T                   1
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   2
# 13 2016-07-29   3426      7.23 F                   1

Data

df <- read.table(text="date     sec_id  liquidity   good     
2016-07-29   3277  9.142245 FALSE 
2016-08-31   3277 11.070555  TRUE  
2016-09-30   3277 11.934113  TRUE  
2016-10-31   3277 12.192237  TRUE  
2016-11-30   3277 10.165183  TRUE  
2016-12-30   3277  8.414033 FALSE 
2016-01-29   3426  6.494181 FALSE 
2016-02-29   3426  8.216213 FALSE 
2016-03-31   3426 10.081115  TRUE  
2016-04-29   3426 10.119685  TRUE  
2016-05-31   3426  8.659732 FALSE 
2016-06-30   3426  6.790178 FALSE  
2016-07-29   3426  7.234159 FALSE  ", header=TRUE)

How to count the number of rows a condition has been met

Question

2 answers

solution1
1 2018-01-16 02:06:49

solution2
1 ACCPTED 2018-01-16 02:07:51

How to count the number of rows a condition has been met

Question

2 answers

solution1 1 2018-01-16 02:06:49

solution2 1 ACCPTED 2018-01-16 02:07:51

solution1
1 2018-01-16 02:06:49

solution2
1 ACCPTED 2018-01-16 02:07:51