简体   繁体   English

R:计算 R 数据集中每个独特个体在过去特定时间段内的出现次数

[英]R: Calculating the number of occurrences within a specific time period in the past for each unique individual in a dataset in R

I'm attempting to tally the number of times an event occurred for a given individual within a specific period of past time.我正在尝试计算过去特定时间段内给定个人发生事件的次数。 In this particular case, I need to know, for each new observation (which reflects a single scheduling request), how many times the individual has scheduled a trip during the preceding 60 days (trip_scheduled).在这种特殊情况下,我需要知道,对于每个新观察(反映单个调度请求),个人在前 60 天内安排了多少次旅行(trip_scheduled)。 Eventually I will need to tally the number of times that person cancelled on the same day as the scheduled trip for the preceding 60 days.最终,我需要计算该人在前 60 天的预定行程的同一天取消的次数。 But I'm starting with just the tally in the "moving" 60-day period.但我只是从“移动”60 天期间的计数开始。

I found some elegant answers to a similar but slightly different problem in this post: R: calculate the number of occurrences of a specific event in a specified time future我在这篇文章中找到了一些类似但略有不同的问题的优雅答案: R:计算特定时间未来特定事件的发生次数

My situation differs in a few ways: First, I'm trying to look at a previous time period, and I don't know if that will change my approach, and, two, I need to do the analysis for more than 40,000 individuals, which I've been trying to accomplish through a mix of the code I found in the other answer, a for loop (which I know is frowned upon) and dplyr grouping.我的情况在几个方面有所不同:第一,我正在尝试查看以前的时间段,我不知道这是否会改变我的方法,第二,我需要对 40,000 多人进行分析,我一直试图通过我在另一个答案中找到的代码的混合来完成,一个 for 循环(我知道这是不赞成的)和 dplyr 分组。 It isn't working at all.它根本不起作用。

Would anyone be able to help point me in the right direction?有人能帮我指出正确的方向吗? I'd love to stick to dplyr and base.我很想坚持使用 dplyr 和 base。 I just don't know much about data.table.我只是不太了解data.table。

This is the code and test data I've been trying to noodle on:这是我一直在尝试处理的代码和测试数据:

test_set2 <- structure(list(tripID = c("20180112-100037-674-101", "20180112-100037-674-201", 
                                       "20180112-100037-674-301", "20180113-100037-676-101", "20180113-100037-676-201", 
                                       "20180115-100037-675-101", "20180115-100037-675-201", "20180116-100037-677-101", 
                                       "20180116-100037-677-201", "20180131-100037-678-101", "20180101-100146-707-101", 
                                       "20180101-100146-707-201", "20180102-100146-708-101", "20180102-100146-708-201", 
                                       "20180103-100146-709-101", "20180103-100146-709-201", "20180104-100146-710-101", 
                                       "20180104-100146-710-201", "20180105-100146-711-101", "20180105-100146-711-201", 
                                       "20180403-100532-223-101", "20180403-100532-223-201", "20180620-100532-224-101", 
                                       "20180620-100532-224-201", "20180704-100532-225-101", "20180704-100532-225-201", 
                                       "20180926-100532-228-101", "20180926-100532-228-201", "20180927-100532-226-101", 
                                       "20180927-100532-226-201"), CUSTOMER_ID = c(100037L, 100037L, 
                                                                                   100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 
                                                                                   100037L, 100146L, 100146L, 100146L, 100146L, 100146L, 100146L, 
                                                                                   100146L, 100146L, 100146L, 100146L, 100532L, 100532L, 100532L, 
                                                                                   100532L, 100532L, 100532L, 100532L, 100532L, 100532L, 100532L
                                       ), trip_date = structure(c(17543, 17543, 17543, 17544, 17544, 
                                                                  17546, 17546, 17547, 17547, 17562, 17532, 17532, 17533, 17533, 
                                                                  17534, 17534, 17535, 17535, 17536, 17536, 17624, 17624, 17702, 
                                                                  17702, 17716, 17716, 17800, 17800, 17801, 17801), class = "Date"), 
                            trip_scheduled = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), same_day_cancel = c(1, 
                                                                                                                       1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -30L), groups = structure(list(
                                                                                                                         CUSTOMER_ID = c(100037L, 100146L, 100532L), .rows = list(
                                                                                                                           1:10, 11:20, 21:30)), row.names = c(NA, -3L), class = c("tbl_df", 
                                                                                                                                                                                   "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
                                                                                                                                                                                                                                  "tbl_df", "tbl", "data.frame"))

running_frame <- test_set2[1,]

unique_customers <- unique(test_set2$CUSTOMER_ID)

for (cust in unique_customers){
  temp_events <- test_set2 %>% filter(CUSTOMER_ID == i)
  cs = cumsum(temp_events$trip_scheduled) # cumulative number of trips of individual
  output_temp <- data.frame(temp_events, 
                            trips_minus_60 = cs[findInterval(temp_events$trip_date - 60, temp_events$trip_date, left.open = TRUE)] - cs)
  new_table <- rbind(new_table,output_temp)

}

This is the error I generated most recently:这是我最近产生的错误:

Error in data.frame(temp_events, trips_minus_60 = cs[findInterval(temp_events$trip_date - : arguments imply differing number of rows: 10, 0 data.frame(temp_events, trips_minus_60 = cs[findInterval(temp_events$trip_date - : 参数意味着不同的行数:10, 0

I'm not sure this meets your needs, but this is based on @Axeman's tidyverse solution you linked to.我不确定这是否满足您的需求,但这是基于您链接到的tidyversetidyverse解决方案。 After group_by your CUSTOMER_ID you can sum all rows with trip_scheduled is 1 and dates fall between current date and 60 days prior.group_by您的CUSTOMER_ID您可以将所有行与trip_scheduled为 1,并且日期介于当前日期和 60 天之前。 I would expect you could do something similar for same_day_cancel as well.我希望你也可以为same_day_cancel做类似的same_day_cancel

library(tidyverse)

test_set2 %>% 
  group_by(CUSTOMER_ID) %>%
    mutate(schedule_60 = unlist(map(trip_date, ~sum(trip_scheduled == 1 & between(trip_date, . - 60, .))))) %>%
  print(n=30)

# A tibble: 30 x 6
# Groups:   CUSTOMER_ID [3]
   tripID                  CUSTOMER_ID trip_date  trip_scheduled same_day_cancel schedule_60
   <chr>                         <int> <date>              <dbl>           <dbl>       <int>
 1 20180112-100037-674-101      100037 2018-01-12              1               1           3
 2 20180112-100037-674-201      100037 2018-01-12              1               1           3
 3 20180112-100037-674-301      100037 2018-01-12              1               1           3
 4 20180113-100037-676-101      100037 2018-01-13              1               0           5
 5 20180113-100037-676-201      100037 2018-01-13              1               0           5
 6 20180115-100037-675-101      100037 2018-01-15              1               1           7
 7 20180115-100037-675-201      100037 2018-01-15              1               1           7
 8 20180116-100037-677-101      100037 2018-01-16              1               0           9
 9 20180116-100037-677-201      100037 2018-01-16              1               0           9
10 20180131-100037-678-101      100037 2018-01-31              1               0          10
11 20180101-100146-707-101      100146 2018-01-01              1               1           2
12 20180101-100146-707-201      100146 2018-01-01              1               1           2
13 20180102-100146-708-101      100146 2018-01-02              1               1           4
14 20180102-100146-708-201      100146 2018-01-02              1               1           4
15 20180103-100146-709-101      100146 2018-01-03              1               1           6
16 20180103-100146-709-201      100146 2018-01-03              1               1           6
17 20180104-100146-710-101      100146 2018-01-04              1               1           8
18 20180104-100146-710-201      100146 2018-01-04              1               1           8
19 20180105-100146-711-101      100146 2018-01-05              1               1          10
20 20180105-100146-711-201      100146 2018-01-05              1               1          10
21 20180403-100532-223-101      100532 2018-04-03              1               0           2
22 20180403-100532-223-201      100532 2018-04-03              1               0           2
23 20180620-100532-224-101      100532 2018-06-20              1               0           2
24 20180620-100532-224-201      100532 2018-06-20              1               0           2
25 20180704-100532-225-101      100532 2018-07-04              1               0           4
26 20180704-100532-225-201      100532 2018-07-04              1               0           4
27 20180926-100532-228-101      100532 2018-09-26              1               0           2
28 20180926-100532-228-201      100532 2018-09-26              1               0           2
29 20180927-100532-226-101      100532 2018-09-27              1               0           4
30 20180927-100532-226-201      100532 2018-09-27              1               0           4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM