简体   繁体   中英

Finding a rolling count of timestamps with a conditional limit in R

Background

I have two tables. One is a table of laboratory values containing a timestamp ( charttime ). Another is a table with medications, containing a starttime and endtime (for when the medication was given and ended, respectively). There is also a subject_id , which is a unique id for each patient, and a "hospital admission" id ( hadm_id ), associated with a patient being admitted into the hospital. The same patient can have multiple hadmi_id s.

Goal

The goal is to obtain the amount of labvalues ( charttime )up to 24 hours prior to the starttime of a given medication, or before the given dosage of medication . I would also like the same to be done in the forward direction if possible, but I am starting with just one direction first. For more clarity, I am essentially trying to discriminate between scenario B and C, from the bottom most image (where there are multiple lab values vs a single lab value in a 24 hour span).

If anybody has a solution using the data.table package I am very open to this, as I think this the more efficient and elegant solution at the end of the day. However, I have much more experience with dplyr so I tried it this way first.

What has been tried

In a previous attempt , I've been able to successfully obtain the most recent lab value prior and after the starttime and endtime of a given medication. Essentially, I did a Cartesian join, and filtered out the extraneous values using grouping and filtering statements. An example of initial dataframes and the output is shown below.

Below is my attempt to select for all values prior to the previous medication (or 24 hours), rather than just the first nearest.

labEventsKExample

    subject_id hadm_id valuenum           charttime
 1:       7216  109208      3.8 2156-09-20 04:00:00
 2:       7216  109208      3.7 2156-09-21 04:00:00
 3:       7216  109208      3.5 2156-09-21 04:00:00
 4:       7216  109208      4.4 2156-09-22 04:00:00
 5:       7216  109208      3.3 2156-09-23 04:00:00
 6:       7216  109208      3.5 2156-09-24 04:00:00
 7:       7216  109208      3.1 2156-09-25 04:00:00
 8:       7216  109208      3.8 2156-09-26 04:00:00
 9:       7216  109208      3.8 2156-09-27 04:00:00
10:       7216  109208      3.2 2156-09-28 04:00:00

repEventsKExample

    subject_id hadm_id linkorderid           starttime             endtime
1:       7216  109208     5810095 2156-09-23 10:00:00 2156-09-23 11:00:00
2:       7216  109208     1068514 2156-09-23 11:45:00 2156-09-23 12:45:00


repEventsKExample %>% 
  inner_join(labEventsKExample, by=c("subject_id" = "subject_id", "hadm_id" = "hadm_id")) %>%
  distinct() %>%
  rename(charttime.lab = charttime) %>%
  collect() -> k_lab_repletions_MV_new_example

k_lab_repletions_MV_new_example %>%
  mutate(isRecentPre = difftime(starttime, charttime.lab, units = "hours") <= 24 & difftime(starttime, charttime.lab, units = "hours") > 0 ) %>%
  mutate(isRecentPost = difftime(endtime, charttime.lab, units = "hours") >= -24 & difftime(endtime, charttime.lab, units = "hours") < 0 )  -> Rep.LE.joined_example 

Rep.LE.joined_example %>%
  filter(isRecentPre) %>% 
  group_by(subject_id, hadm_id,charttime.lab) %>%
  mutate(isMostRecentRepletion = starttime == min(starttime)) %>%
  filter(isMostRecentRepletion) %>%
  ungroup() %>% 
  group_by(subject_id, hadm_id, starttime,endtime) %>%
  arrange(subject_id,starttime) %>%
  mutate(isMostRecentLabEvent = charttime.lab == max(charttime.lab)) %>%
  mutate(recentPreLVs = charttime.lab > dplyr::lag(starttime)) %>%
  filter(recentPreLVs == TRUE)

Data

Below is some toy data to try the join method.

structure(list(subject_id = c(7216L, 7216L, 7216L, 7216L, 7216L, 
7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 
7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 7216L, 
7216L, 7216L, 7216L, 7216L), hadm_id = c(109208L, 109208L, 109208L, 
109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 
109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 109208L, 
132876L, 132876L, 132876L, 132876L, 132876L, 132876L, 132876L, 
132876L, 132876L, 132876L), valuenum = c(3.8, 3.7, 3.5, 4.4, 
3.3, 3.5, 3.1, 3.8, 3.8, 3.2, 4.4, 4.1, 4.5, 4.1, 4, 4, 3.8, 
3.8, 3.7, 3.1, 3.4, 3.6, 3.5, 3.8, 3, 3.3, 3.1), charttime = structure(c(5892321600, 
5892408000, 5892408000, 5892494400, 5892580800, 5892667200, 5892753600, 
5892840000, 5892926400, 5893012800, 5893012800, 5893099200, 5893099200, 
5893185600, 5893185600, 5893272000, 5893358400, 5817499200, 5817585600, 
5817585600, 5817672000, 5817672000, 5817758400, 5817844800, 5817931200, 
5818017600, 5818104000), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -27L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7f87b1017ee0>) -> 

structure(list(subject_id = c(7216L, 7216L), hadm_id = c(109208L, 
109208L), linkorderid = c(5810095L, 1068514L), starttime = structure(c(5892602400, 
5892608700), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    endtime = structure(c(5892606000, 5892612300), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7f87b1017ee0>) -> repEventsKExample

在此处输入图像描述

Here's a solution with foverlaps() . This implementation of foverlaps isn't the shortest way to your goal, but sometimes it can help to see the long form solution and pare back from there.

I use lubridate just for creating the toy data times.

    library(lubridate)
    
    firsttime <- as.POSIXct(today() + hours(1))
    times <- firsttime + hours(1:96)
    data <- data.table(subject_id = 7216,
                       charttimes = times,
                       valuenums = sample(c(3.8, 3.7, 3.5, 4.4, 3.3, 3.5, 3.1, 3.8, 3.8,
                                            3.2, 4.4, 4.1, 4.5, 4.1, 4, 4, 3.8, 3.8, 3.7,
                                            3.1, 3.4, 3.6, 3.5, 3.8, 3, 3.3, 3.1),
                                          replace = TRUE,
                                  size = 96))

    # Pick a random point to act as the reference, around which we want a window
    # of 24 hours on either side
    ref <- data[40]

    # We create the start and end window as variables in the reference data 
    ref[, start_window := charttimes - hours(24)]
    ref[, end_window := charttimes + hours(24)]

    # And we need a duplicate of the chart times in the data for foverlap
    data[, charttimes_dup := charttimes]

    # Set the keys, including subject_id and the duplicate chart times
    setkey(data, subject_id, charttimes, charttimes_dup)
    setkey(ref, subject_id, start_window, end_window)

    # foverlaps returns all the matches of charttimes occurring between start
    # and end window. What you want to do with that afterwards can shorten the process.
    data_within_window <- foverlaps(ref, data)

You can do this for multiple subject_ids at the same time, and even for multiple window periods.

data2 <- data.table(subject_id = 7218,
                    charttimes = times,
                    valuenums = sample(c(3.8, 3.7, 3.5, 4.4, 3.3, 3.5, 3.1, 3.8, 3.8,
                                         3.2, 4.4, 4.1, 4.5, 4.1, 4, 4, 3.8, 3.8, 3.7,
                                         3.1, 3.4, 3.6, 3.5, 3.8, 3, 3.3, 3.1),
                                       replace = TRUE,
                                       size = 96))

data2 <- rbindlist(list(data, data2), fill = TRUE)

refs <- data2[c(40, 100)]
refs[, charttimes_dup := NULL]

refs[, start_window := charttimes - hours(24)]
refs[, end_window := charttimes + hours(24)]
data2[, charttimes_dup := charttimes]
setkey(data2, subject_id, charttimes, charttimes_dup)
setkey(refs, subject_id, start_window, end_window)
data_within_window <- foverlaps(refs, data2)
data_within_window[, .N, subject_id]



#    subject_id  N
# 1:       7216 49
# 2:       7218 28

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM