简体   繁体   English

使用 dplyr 和 mutate 对 R 中的新变量进行分类

[英]Using dplyr and mutate to categorize a new variable in R

record_id <- c(1,1,1,2,3,4,4,5,6,7,8,8,9,10,10,10)
visit_date <- c("2018-09-24", "2018-12-05", "2019-03-01", "2018-10-03", "2018-10-01", "2018-10-05", "NA", "2018-08-25", "2018-09-19", "2018-10-01", "2018-09-27", "2021-09-07", 
"2018-10-03", "2018-10-08", "2019-03-22", "2019-07-12")
repeat_instance <- c(0,1,2,0,0,0,1,0,0,0,0,1,0,0,1,2)
Time_Since_Appointment <- c("NA", "72d 1H 0M 0S", "86d 0H 0M 0S", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA", "1076d 0H 0M 0S", "NA", "NA", "165d 0H 0M 0S", "112d 0H 0M 0S") 

data1 <- data %>% 
  group_by(record_id) %>%
  mutate(Time_Since_Appointment = Visit_Date - lag(Visit_Date))

data1$Time_Since_Appointment <- seconds_to_period(data1$Time_Since_Appointment)

test1 <- test %>%
  mutate(Retention = 
           case_when(Time_Since_Appointment <= 90 ~ "Retained within 3 months", 
                     Time_Since_Appointment > 91 & Time_Since_Appointment <= 180 ~ "Retained within 6 months",
                     Time_Since_Appointment > 180 ~ "Not Retained"))

I am trying to create a variable that creates a category based on the time since the previous appointment and if there was no follow up appointment it calculates the time since the first appointment and today.我正在尝试创建一个变量,该变量根据自上次约会以来的时间创建一个类别,如果没有后续约会,它会计算自第一次约会到今天的时间。

These calculated times will then be used to create 3 categories: Retained within 3 months (<90 days), Retained within 6 months (90 - 180 days), and Not retained (>180 days).这些计算出的时间将用于创建 3 个类别:3 个月内保留(<90 天)、6 个月内保留(90 - 180 天)和不保留(>180 天)。

I have included the code I have used so far with some success up until the point in which I used dplyr and mutate to try and create a new variable called Retention.我已经包含了我迄今为止使用的代码并取得了一些成功,直到我使用 dplyr 和变异来尝试创建一个名为 Retention 的新变量。

The problem appears to be that you are assuming a period object is comparable with the number of days, but in fact it stores the number of seconds , as you can confirm by doing问题似乎是您假设一个period object 与天数相当,但实际上它存储了秒数,您可以通过执行确认

period(1, "day") > 1000
#> [1] TRUE

as.numeric(period(1, "day"))
#> [1] 86400

So you need to divide the number of seconds by 86400 to get the number of days.所以需要用秒数除以 86400 得到天数。 I would also tend to use cut rather then case_when for dealing with numeric data:我也倾向于使用cut而不是case_when来处理数字数据:

library(dplyr)
library(lubridate)

data %>% 
  group_by(record_id) %>%
  mutate(Time_Since_Appointment = visit_date - lag(visit_date),
         Time_Since_Appointment = seconds_to_period(Time_Since_Appointment),
         visit_date = as.Date(visit_date),
         Retention = cut(as.numeric(Time_Since_Appointment) / 86400,
                         breaks = c(0, 90, 180, Inf),
                         labels = c("Retained within 3 months",
                                    "Retained within 6 months",
                                    "Not retained")))
#> # A tibble: 16 x 5
#> # Groups:   record_id [10]
#>    record_id visit_date repeat_instance Time_Since_Appointment Retention               
#>        <dbl> <date>               <dbl> <Period>               <fct>                   
#>  1         1 2018-09-24               0 NA                     NA                      
#>  2         1 2018-12-05               1 72d 0H 0M 0S           Retained within 3 months
#>  3         1 2019-03-01               2 86d 0H 0M 0S           Retained within 3 months
#>  4         2 2018-10-03               0 NA                     NA                      
#>  5         3 2018-10-01               0 NA                     NA                      
#>  6         4 2018-10-05               0 NA                     NA                      
#>  7         4 NA                       1 NA                     NA                      
#>  8         5 2018-08-25               0 NA                     NA                      
#>  9         6 2018-09-19               0 NA                     NA                      
#> 10         7 2018-10-01               0 NA                     NA                      
#> 11         8 2018-09-27               0 NA                     NA                      
#> 12         8 2021-09-07               1 1076d 0H 0M 0S         Not retained            
#> 13         9 2018-10-03               0 NA                     NA                      
#> 14        10 2018-10-08               0 NA                     NA                      
#> 15        10 2019-03-22               1 165d 0H 0M 0S          Retained within 6 months
#> 16        10 2019-07-12               2 112d 0H 0M 0S          Retained within 6 months


Created on 2022-08-26 with reprex v2.0.2使用reprex v2.0.2创建于 2022-08-26

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM