I'm attempting to tally the number of times an event occurred for a given individual within a specific period of past time. In this particular case, I need to know, for each new observation (which reflects a single scheduling request), how many times the individual has scheduled a trip during the preceding 60 days (trip_scheduled). Eventually I will need to tally the number of times that person cancelled on the same day as the scheduled trip for the preceding 60 days. But I'm starting with just the tally in the "moving" 60-day period.
I found some elegant answers to a similar but slightly different problem in this post: R: calculate the number of occurrences of a specific event in a specified time future
My situation differs in a few ways: First, I'm trying to look at a previous time period, and I don't know if that will change my approach, and, two, I need to do the analysis for more than 40,000 individuals, which I've been trying to accomplish through a mix of the code I found in the other answer, a for loop (which I know is frowned upon) and dplyr grouping. It isn't working at all.
Would anyone be able to help point me in the right direction? I'd love to stick to dplyr and base. I just don't know much about data.table.
This is the code and test data I've been trying to noodle on:
test_set2 <- structure(list(tripID = c("20180112-100037-674-101", "20180112-100037-674-201",
"20180112-100037-674-301", "20180113-100037-676-101", "20180113-100037-676-201",
"20180115-100037-675-101", "20180115-100037-675-201", "20180116-100037-677-101",
"20180116-100037-677-201", "20180131-100037-678-101", "20180101-100146-707-101",
"20180101-100146-707-201", "20180102-100146-708-101", "20180102-100146-708-201",
"20180103-100146-709-101", "20180103-100146-709-201", "20180104-100146-710-101",
"20180104-100146-710-201", "20180105-100146-711-101", "20180105-100146-711-201",
"20180403-100532-223-101", "20180403-100532-223-201", "20180620-100532-224-101",
"20180620-100532-224-201", "20180704-100532-225-101", "20180704-100532-225-201",
"20180926-100532-228-101", "20180926-100532-228-201", "20180927-100532-226-101",
"20180927-100532-226-201"), CUSTOMER_ID = c(100037L, 100037L,
100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100146L, 100146L, 100146L, 100146L, 100146L, 100146L,
100146L, 100146L, 100146L, 100146L, 100532L, 100532L, 100532L,
100532L, 100532L, 100532L, 100532L, 100532L, 100532L, 100532L
), trip_date = structure(c(17543, 17543, 17543, 17544, 17544,
17546, 17546, 17547, 17547, 17562, 17532, 17532, 17533, 17533,
17534, 17534, 17535, 17535, 17536, 17536, 17624, 17624, 17702,
17702, 17716, 17716, 17800, 17800, 17801, 17801), class = "Date"),
trip_scheduled = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), same_day_cancel = c(1,
1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -30L), groups = structure(list(
CUSTOMER_ID = c(100037L, 100146L, 100532L), .rows = list(
1:10, 11:20, 21:30)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
running_frame <- test_set2[1,]
unique_customers <- unique(test_set2$CUSTOMER_ID)
for (cust in unique_customers){
temp_events <- test_set2 %>% filter(CUSTOMER_ID == i)
cs = cumsum(temp_events$trip_scheduled) # cumulative number of trips of individual
output_temp <- data.frame(temp_events,
trips_minus_60 = cs[findInterval(temp_events$trip_date - 60, temp_events$trip_date, left.open = TRUE)] - cs)
new_table <- rbind(new_table,output_temp)
}
This is the error I generated most recently:
Error in data.frame(temp_events, trips_minus_60 = cs[findInterval(temp_events$trip_date - : arguments imply differing number of rows: 10, 0
I'm not sure this meets your needs, but this is based on @Axeman's tidyverse
solution you linked to. After group_by
your CUSTOMER_ID
you can sum all rows with trip_scheduled
is 1 and dates fall between current date and 60 days prior. I would expect you could do something similar for same_day_cancel
as well.
library(tidyverse)
test_set2 %>%
group_by(CUSTOMER_ID) %>%
mutate(schedule_60 = unlist(map(trip_date, ~sum(trip_scheduled == 1 & between(trip_date, . - 60, .))))) %>%
print(n=30)
# A tibble: 30 x 6
# Groups: CUSTOMER_ID [3]
tripID CUSTOMER_ID trip_date trip_scheduled same_day_cancel schedule_60
<chr> <int> <date> <dbl> <dbl> <int>
1 20180112-100037-674-101 100037 2018-01-12 1 1 3
2 20180112-100037-674-201 100037 2018-01-12 1 1 3
3 20180112-100037-674-301 100037 2018-01-12 1 1 3
4 20180113-100037-676-101 100037 2018-01-13 1 0 5
5 20180113-100037-676-201 100037 2018-01-13 1 0 5
6 20180115-100037-675-101 100037 2018-01-15 1 1 7
7 20180115-100037-675-201 100037 2018-01-15 1 1 7
8 20180116-100037-677-101 100037 2018-01-16 1 0 9
9 20180116-100037-677-201 100037 2018-01-16 1 0 9
10 20180131-100037-678-101 100037 2018-01-31 1 0 10
11 20180101-100146-707-101 100146 2018-01-01 1 1 2
12 20180101-100146-707-201 100146 2018-01-01 1 1 2
13 20180102-100146-708-101 100146 2018-01-02 1 1 4
14 20180102-100146-708-201 100146 2018-01-02 1 1 4
15 20180103-100146-709-101 100146 2018-01-03 1 1 6
16 20180103-100146-709-201 100146 2018-01-03 1 1 6
17 20180104-100146-710-101 100146 2018-01-04 1 1 8
18 20180104-100146-710-201 100146 2018-01-04 1 1 8
19 20180105-100146-711-101 100146 2018-01-05 1 1 10
20 20180105-100146-711-201 100146 2018-01-05 1 1 10
21 20180403-100532-223-101 100532 2018-04-03 1 0 2
22 20180403-100532-223-201 100532 2018-04-03 1 0 2
23 20180620-100532-224-101 100532 2018-06-20 1 0 2
24 20180620-100532-224-201 100532 2018-06-20 1 0 2
25 20180704-100532-225-101 100532 2018-07-04 1 0 4
26 20180704-100532-225-201 100532 2018-07-04 1 0 4
27 20180926-100532-228-101 100532 2018-09-26 1 0 2
28 20180926-100532-228-201 100532 2018-09-26 1 0 2
29 20180927-100532-226-101 100532 2018-09-27 1 0 4
30 20180927-100532-226-201 100532 2018-09-27 1 0 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.