I currently have a dataset in the below format
id, date, category, city
1, 2016-01-01, A CityA
2, 2016-01-01, B CityA
etc.
I'm trying to use mutate such that it can give me a conditional running count in the last 30 days or x time frame.
To start I tried using to see if it works and extend it from there
mutate(df, last_thirty_day_count = sum(df$id < id & df$city == city))
But it just gives me zeroes.
Any help is appreciated.
First, here is a slightly longer example dataset
set.seed(8675309)
sampleData <-
data_frame(id = 1:20
, date = seq(as.Date("2017-01-01")
, as.Date("2017-01-20")
, by = "day")
, category = sample(LETTERS[1:3], 20, TRUE)
, city = sample(letters[1:3], 20, TRUE)
)
Then, just decide what counts as a qualifying observation. It is unclear from your question what cut off(s) you want to use. Here, I am using January 4th as a cutoff, but you could use whatever is appropriate for your case. Then, group_by
the variable you want to count for, and just add them up. This assumes that they are in in order, if they are not, make sure to arrange
them first.
sampleData %>%
mutate(QualifiyingObs = date > "2017-01-04") %>%
group_by(city) %>%
mutate(CountOfQual = cumsum(QualifiyingObs))
Gives
id date category city QualifiyingObs CountOfQual
<int> <date> <chr> <chr> <lgl> <int>
1 1 2017-01-01 A a FALSE 0
2 2 2017-01-02 B c FALSE 0
3 3 2017-01-03 C c FALSE 0
4 4 2017-01-04 C a FALSE 0
5 5 2017-01-05 A b TRUE 1
6 6 2017-01-06 C c TRUE 1
7 7 2017-01-07 C a TRUE 1
8 8 2017-01-08 C a TRUE 2
9 9 2017-01-09 C a TRUE 3
10 10 2017-01-10 B c TRUE 2
11 11 2017-01-11 C c TRUE 3
12 12 2017-01-12 B c TRUE 4
13 13 2017-01-13 B a TRUE 4
14 14 2017-01-14 A b TRUE 2
15 15 2017-01-15 C a TRUE 5
16 16 2017-01-16 C b TRUE 3
17 17 2017-01-17 C b TRUE 4
18 18 2017-01-18 A b TRUE 5
19 19 2017-01-19 C a TRUE 6
20 20 2017-01-20 C c TRUE 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.