簡體   English   中英

使用 dplyr 和 mutate 對 R 中的新變量進行分類

[英]Using dplyr and mutate to categorize a new variable in R

record_id <- c(1,1,1,2,3,4,4,5,6,7,8,8,9,10,10,10)
visit_date <- c("2018-09-24", "2018-12-05", "2019-03-01", "2018-10-03", "2018-10-01", "2018-10-05", "NA", "2018-08-25", "2018-09-19", "2018-10-01", "2018-09-27", "2021-09-07", 
"2018-10-03", "2018-10-08", "2019-03-22", "2019-07-12")
repeat_instance <- c(0,1,2,0,0,0,1,0,0,0,0,1,0,0,1,2)
Time_Since_Appointment <- c("NA", "72d 1H 0M 0S", "86d 0H 0M 0S", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "1076d 0H 0M 0S", "NA", "NA", "165d 0H 0M 0S", "112d 0H 0M 0S") 

data1 <- data %>% 
  group_by(record_id) %>%
  mutate(Time_Since_Appointment = Visit_Date - lag(Visit_Date))

data1$Time_Since_Appointment <- seconds_to_period(data1$Time_Since_Appointment)

test1 <- test %>%
  mutate(Retention = 
           case_when(Time_Since_Appointment <= 90 ~ "Retained within 3 months", 
                     Time_Since_Appointment > 91 & Time_Since_Appointment <= 180 ~ "Retained within 6 months",
                     Time_Since_Appointment > 180 ~ "Not Retained"))

我正在嘗試創建一個變量,該變量根據自上次約會以來的時間創建一個類別,如果沒有后續約會,它會計算自第一次約會到今天的時間。

這些計算出的時間將用於創建 3 個類別:3 個月內保留(<90 天)、6 個月內保留(90 - 180 天)和不保留(>180 天)。

我已經包含了我迄今為止使用的代碼並取得了一些成功,直到我使用 dplyr 和變異來嘗試創建一個名為 Retention 的新變量。

問題似乎是您假設一個period object 與天數相當,但實際上它存儲了秒數,您可以通過執行確認

period(1, "day") > 1000
#> [1] TRUE

as.numeric(period(1, "day"))
#> [1] 86400

所以需要用秒數除以 86400 得到天數。 我也傾向於使用cut而不是case_when來處理數字數據:

library(dplyr)
library(lubridate)

data %>% 
  group_by(record_id) %>%
  mutate(Time_Since_Appointment = visit_date - lag(visit_date),
         Time_Since_Appointment = seconds_to_period(Time_Since_Appointment),
         visit_date = as.Date(visit_date),
         Retention = cut(as.numeric(Time_Since_Appointment) / 86400,
                         breaks = c(0, 90, 180, Inf),
                         labels = c("Retained within 3 months",
                                    "Retained within 6 months",
                                    "Not retained")))
#> # A tibble: 16 x 5
#> # Groups:   record_id [10]
#>    record_id visit_date repeat_instance Time_Since_Appointment Retention               
#>        <dbl> <date>               <dbl> <Period>               <fct>                   
#>  1         1 2018-09-24               0 NA                     NA                      
#>  2         1 2018-12-05               1 72d 0H 0M 0S           Retained within 3 months
#>  3         1 2019-03-01               2 86d 0H 0M 0S           Retained within 3 months
#>  4         2 2018-10-03               0 NA                     NA                      
#>  5         3 2018-10-01               0 NA                     NA                      
#>  6         4 2018-10-05               0 NA                     NA                      
#>  7         4 NA                       1 NA                     NA                      
#>  8         5 2018-08-25               0 NA                     NA                      
#>  9         6 2018-09-19               0 NA                     NA                      
#> 10         7 2018-10-01               0 NA                     NA                      
#> 11         8 2018-09-27               0 NA                     NA                      
#> 12         8 2021-09-07               1 1076d 0H 0M 0S         Not retained            
#> 13         9 2018-10-03               0 NA                     NA                      
#> 14        10 2018-10-08               0 NA                     NA                      
#> 15        10 2019-03-22               1 165d 0H 0M 0S          Retained within 6 months
#> 16        10 2019-07-12               2 112d 0H 0M 0S          Retained within 6 months


使用reprex v2.0.2創建於 2022-08-26

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM