I have a dataset that represents patients visits to various doctors in a certain practice throughout a year.
Example-
doctor patient_no datetime
dr.kahn 1561 1/21/19 10:30:00
dr.gould 1397 2/06/19 12:30:00
dr.amoor 1596 2/11/19 9:00:00
dr.gould 995 10/07/19 12:30:00
dr.kahn 1561 10/14/19 9:30
I'm trying to create a new dataframe where each row is a unique doctor-patient pairing and shows the numbers of times that patient visited that doctor, along with the average time surpassed between visits for that particular patient-doctor pairing. So for instance if patient A went to dr.kahn 4 times in a year, what was the average amount of time in between patient A's consecutive appointments to dr.kahn.
Example-
doctor patient_no number_of_visits avg_time_passed_between_appointments
dr.gould 1054 7 2 months 1 days 2:00:00
dr.gould 1099 2 5 months 10 days 00:00:00
dr.kahn 875 12 0 months 26 days 0:30:00
Any help would be appreciated. Thanks!
Here's a dplyr approach:
library(tidyverse)
df %>%
mutate(datetime = lubridate::mdy_hm(datetime)) %>%
group_by(doctor, patient_no) %>%
summarize(count = n(),
avg_days_between = (max(datetime) - min(datetime)) / lubridate::ddays(count - 1)) %>%
ungroup()
## A tibble: 4 x 4
# doctor patient_no count avg_days_between
# <chr> <dbl> <int> <dbl>
#1 dr.amoor 1596 1 NaN
#2 dr.gould 995 1 NaN
#3 dr.gould 1397 1 NaN
#4 dr.kahn 1561 2 266.
Or you could calculate each lag and use a different method, like the median, to characterize avg delta.
df %>%
group_by(doctor, patient_no) %>%
mutate(datetime = lubridate::mdy_hm(datetime),
# coalesce helps
days_since_last = coalesce(c(datetime - lag(datetime))/
lubridate::ddays(1), 0)) %>%
summarize(count = n(),
median_time_between = median(days_since_last))
## A tibble: 4 x 4
# Groups: doctor [3]
# doctor patient_no count median_time_between
# <chr> <dbl> <int> <dbl>
#1 dr.amoor 1596 1 0
#2 dr.gould 995 1 0
#3 dr.gould 1397 1 0
#4 dr.kahn 1561 2 133.
sample data
df <- tibble::tribble(
~doctor, ~patient_no, ~datetime,
"dr.kahn", 1561, "1/21/19 10:30",
"dr.gould", 1397, "2/6/19 12:30",
"dr.amoor", 1596, "2/11/19 9:00",
"dr.gould", 995, "10/7/19 12:30",
"dr.kahn", 1561, "10/14/19 9:30"
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.