R dplyr conditional sum with mutate

Question

I currently have a dataset in the below format

id, date,         category,      city
1, 2016-01-01,       A            CityA
2, 2016-01-01,       B            CityA

etc.

I'm trying to use mutate such that it can give me a conditional running count in the last 30 days or x time frame.

To start I tried using to see if it works and extend it from there

  mutate(df, last_thirty_day_count = sum(df$id < id & df$city == city))

But it just gives me zeroes.

Any help is appreciated.

Answer 1

First, here is a slightly longer example dataset

set.seed(8675309)
sampleData <-
  data_frame(id = 1:20
             , date = seq(as.Date("2017-01-01")
                          , as.Date("2017-01-20")
                          , by = "day")
             , category = sample(LETTERS[1:3], 20, TRUE)
             , city = sample(letters[1:3], 20, TRUE)
             )

Then, just decide what counts as a qualifying observation. It is unclear from your question what cut off(s) you want to use. Here, I am using January 4th as a cutoff, but you could use whatever is appropriate for your case. Then, group_by the variable you want to count for, and just add them up. This assumes that they are in in order, if they are not, make sure to arrange them first.

sampleData %>%
  mutate(QualifiyingObs = date > "2017-01-04") %>%
  group_by(city) %>%
  mutate(CountOfQual = cumsum(QualifiyingObs))

Gives

      id       date category  city QualifiyingObs CountOfQual
   <int>     <date>    <chr> <chr>          <lgl>       <int>
1      1 2017-01-01        A     a          FALSE           0
2      2 2017-01-02        B     c          FALSE           0
3      3 2017-01-03        C     c          FALSE           0
4      4 2017-01-04        C     a          FALSE           0
5      5 2017-01-05        A     b           TRUE           1
6      6 2017-01-06        C     c           TRUE           1
7      7 2017-01-07        C     a           TRUE           1
8      8 2017-01-08        C     a           TRUE           2
9      9 2017-01-09        C     a           TRUE           3
10    10 2017-01-10        B     c           TRUE           2
11    11 2017-01-11        C     c           TRUE           3
12    12 2017-01-12        B     c           TRUE           4
13    13 2017-01-13        B     a           TRUE           4
14    14 2017-01-14        A     b           TRUE           2
15    15 2017-01-15        C     a           TRUE           5
16    16 2017-01-16        C     b           TRUE           3
17    17 2017-01-17        C     b           TRUE           4
18    18 2017-01-18        A     b           TRUE           5
19    19 2017-01-19        C     a           TRUE           6
20    20 2017-01-20        C     c           TRUE           5

R dplyr conditional sum with mutate

Question

1 answers

solution1
2 ACCPTED 2017-04-28 20:30:43

R dplyr conditional sum with mutate

Question

1 answers

solution1 2 ACCPTED 2017-04-28 20:30:43

solution1
2 ACCPTED 2017-04-28 20:30:43