簡體   English   中英

R:計算 R 數據集中每個獨特個體在過去特定時間段內的出現次數

[英]R: Calculating the number of occurrences within a specific time period in the past for each unique individual in a dataset in R

我正在嘗試計算過去特定時間段內給定個人發生事件的次數。 在這種特殊情況下,我需要知道,對於每個新觀察(反映單個調度請求),個人在前 60 天內安排了多少次旅行(trip_scheduled)。 最終,我需要計算該人在前 60 天的預定行程的同一天取消的次數。 但我只是從“移動”60 天期間的計數開始。

我在這篇文章中找到了一些類似但略有不同的問題的優雅答案: R:計算特定時間未來特定事件的發生次數

我的情況在幾個方面有所不同:第一,我正在嘗試查看以前的時間段,我不知道這是否會改變我的方法,第二,我需要對 40,000 多人進行分析,我一直試圖通過我在另一個答案中找到的代碼的混合來完成,一個 for 循環(我知道這是不贊成的)和 dplyr 分組。 它根本不起作用。

有人能幫我指出正確的方向嗎? 我很想堅持使用 dplyr 和 base。 我只是不太了解data.table。

這是我一直在嘗試處理的代碼和測試數據:

test_set2 <- structure(list(tripID = c("20180112-100037-674-101", "20180112-100037-674-201", 
                                       "20180112-100037-674-301", "20180113-100037-676-101", "20180113-100037-676-201", 
                                       "20180115-100037-675-101", "20180115-100037-675-201", "20180116-100037-677-101", 
                                       "20180116-100037-677-201", "20180131-100037-678-101", "20180101-100146-707-101", 
                                       "20180101-100146-707-201", "20180102-100146-708-101", "20180102-100146-708-201", 
                                       "20180103-100146-709-101", "20180103-100146-709-201", "20180104-100146-710-101", 
                                       "20180104-100146-710-201", "20180105-100146-711-101", "20180105-100146-711-201", 
                                       "20180403-100532-223-101", "20180403-100532-223-201", "20180620-100532-224-101", 
                                       "20180620-100532-224-201", "20180704-100532-225-101", "20180704-100532-225-201", 
                                       "20180926-100532-228-101", "20180926-100532-228-201", "20180927-100532-226-101", 
                                       "20180927-100532-226-201"), CUSTOMER_ID = c(100037L, 100037L, 
                                                                                   100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 
                                                                                   100037L, 100146L, 100146L, 100146L, 100146L, 100146L, 100146L, 
                                                                                   100146L, 100146L, 100146L, 100146L, 100532L, 100532L, 100532L, 
                                                                                   100532L, 100532L, 100532L, 100532L, 100532L, 100532L, 100532L
                                       ), trip_date = structure(c(17543, 17543, 17543, 17544, 17544, 
                                                                  17546, 17546, 17547, 17547, 17562, 17532, 17532, 17533, 17533, 
                                                                  17534, 17534, 17535, 17535, 17536, 17536, 17624, 17624, 17702, 
                                                                  17702, 17716, 17716, 17800, 17800, 17801, 17801), class = "Date"), 
                            trip_scheduled = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), same_day_cancel = c(1, 
                                                                                                                       1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -30L), groups = structure(list(
                                                                                                                         CUSTOMER_ID = c(100037L, 100146L, 100532L), .rows = list(
                                                                                                                           1:10, 11:20, 21:30)), row.names = c(NA, -3L), class = c("tbl_df", 
                                                                                                                                                                                   "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
                                                                                                                                                                                                                                  "tbl_df", "tbl", "data.frame"))

running_frame <- test_set2[1,]

unique_customers <- unique(test_set2$CUSTOMER_ID)

for (cust in unique_customers){
  temp_events <- test_set2 %>% filter(CUSTOMER_ID == i)
  cs = cumsum(temp_events$trip_scheduled) # cumulative number of trips of individual
  output_temp <- data.frame(temp_events, 
                            trips_minus_60 = cs[findInterval(temp_events$trip_date - 60, temp_events$trip_date, left.open = TRUE)] - cs)
  new_table <- rbind(new_table,output_temp)

}

這是我最近產生的錯誤:

data.frame(temp_events, trips_minus_60 = cs[findInterval(temp_events$trip_date - : 參數意味着不同的行數:10, 0

我不確定這是否滿足您的需求,但這是基於您鏈接到的tidyversetidyverse解決方案。 group_by您的CUSTOMER_ID您可以將所有行與trip_scheduled為 1,並且日期介於當前日期和 60 天之前。 我希望你也可以為same_day_cancel做類似的same_day_cancel

library(tidyverse)

test_set2 %>% 
  group_by(CUSTOMER_ID) %>%
    mutate(schedule_60 = unlist(map(trip_date, ~sum(trip_scheduled == 1 & between(trip_date, . - 60, .))))) %>%
  print(n=30)

# A tibble: 30 x 6
# Groups:   CUSTOMER_ID [3]
   tripID                  CUSTOMER_ID trip_date  trip_scheduled same_day_cancel schedule_60
   <chr>                         <int> <date>              <dbl>           <dbl>       <int>
 1 20180112-100037-674-101      100037 2018-01-12              1               1           3
 2 20180112-100037-674-201      100037 2018-01-12              1               1           3
 3 20180112-100037-674-301      100037 2018-01-12              1               1           3
 4 20180113-100037-676-101      100037 2018-01-13              1               0           5
 5 20180113-100037-676-201      100037 2018-01-13              1               0           5
 6 20180115-100037-675-101      100037 2018-01-15              1               1           7
 7 20180115-100037-675-201      100037 2018-01-15              1               1           7
 8 20180116-100037-677-101      100037 2018-01-16              1               0           9
 9 20180116-100037-677-201      100037 2018-01-16              1               0           9
10 20180131-100037-678-101      100037 2018-01-31              1               0          10
11 20180101-100146-707-101      100146 2018-01-01              1               1           2
12 20180101-100146-707-201      100146 2018-01-01              1               1           2
13 20180102-100146-708-101      100146 2018-01-02              1               1           4
14 20180102-100146-708-201      100146 2018-01-02              1               1           4
15 20180103-100146-709-101      100146 2018-01-03              1               1           6
16 20180103-100146-709-201      100146 2018-01-03              1               1           6
17 20180104-100146-710-101      100146 2018-01-04              1               1           8
18 20180104-100146-710-201      100146 2018-01-04              1               1           8
19 20180105-100146-711-101      100146 2018-01-05              1               1          10
20 20180105-100146-711-201      100146 2018-01-05              1               1          10
21 20180403-100532-223-101      100532 2018-04-03              1               0           2
22 20180403-100532-223-201      100532 2018-04-03              1               0           2
23 20180620-100532-224-101      100532 2018-06-20              1               0           2
24 20180620-100532-224-201      100532 2018-06-20              1               0           2
25 20180704-100532-225-101      100532 2018-07-04              1               0           4
26 20180704-100532-225-201      100532 2018-07-04              1               0           4
27 20180926-100532-228-101      100532 2018-09-26              1               0           2
28 20180926-100532-228-201      100532 2018-09-26              1               0           2
29 20180927-100532-226-101      100532 2018-09-27              1               0           4
30 20180927-100532-226-201      100532 2018-09-27              1               0           4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM