简体   繁体   中英

How to summarise time-series data with unequal number of observations with R

I have a large dataframe (86000 rows) consisting of several patients, each of which had several occasions of blood tests (just 3 tests: T1, T2 and T3) during their stay. Some of these patients were hospitalised for 3 days, some for 168 days.

This is just a fraction of the output from count function , which shows the big variation of the days spent in the hospital:

No  Id     Days
148 29757  111
149 30368   36
150 31062   29
151 31993   24
152 32198   51
153 32438    6
154 32836   74
155 32944   24
156 33467   39
157 36108   90
158 36849    6
159 37136    3

I used aggregate to calculate the means etc, but I would like to see a summary of who did improve or deteriorate during their stay.

I think that this would involve to extract at least the first and last tests, and take the difference (the lower, the better). But I couldn't find a way to do that.

I thought that an easier solution would be to transform the whole results to ordered data (according to the normal ranges of the tests) and see how many of them had abnormally low or high values. Unfortunately almost every patient has lows and highs.

Ideally, I would like to see the progress of several patients (or groups of paients) over time. But since they were hospitalised in different time-frames, the (over-simplified) result is like that:

仅2位患者的结果,他们在完全不同的时间范围内住院

As you can see, the first patient (red dots) started with mediocre values, quickly worsened (high values), and then improved (lower values). The progress of the second patient is not clear, since his/her stay was probably short.

Could someone suggest a starter (code or idea)? I checked some questions about multiple time-series plots with unequal observations , but they are not helpful in my case. An example (anonymised) dataset is here:

structure(list(Id = c("10200", "10200", "10200", "10200", "10200", 
"10200", "10700", "10700", "10700", "10700", "10700", "10700", 
"10700", "10700", "10700", "10700", "10700", "10700", "10700", 
"10700", "10700", "10766", "10766", "10766", "10766", "10766", 
"10766", "10766", "10766", "10766", "10766", "10766", "10766", 
"10766", "10766", "10766", "10766", "10766", "10766", "10766"
), Date = structure(c(15068, 15068, 15068, 15069, 15069, 15069, 
15072, 15072, 15072, 15072, 15072, 15072, 15073, 15073, 15073, 
15075, 15075, 15075, 15078, 15078, 15078, 15073, 15074, 15074, 
15075, 15075, 15075, 15075, 15076, 15076, 15076, 15078, 15078, 
15078, 15081, 15082, 15083, 15084, 15085, 15085), class = "Date"), 
    Test = c("T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", 
    "T2", "T2", "T3", "T3", "T1", "T2", "T3", "T1", "T2", "T3", 
    "T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2", "T2", 
    "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1", 
    "T1", "T2"), Result = c(131, 4.53, 5.4, 108, 3.19, 3.7, 125, 
    NA, 1.26, NA, NA, 3.8, 125, 0.97, 4.2, 73, 0.84, 6.6, 48, 
    0.52, 4.8, 60, 75, 0.83, 52, 51, 0.62, 0.65, 40, 0.57, 4.1, 
    45, 0.54, 3.7, 96, 77, 1.04, 134, 144, 0.95)), .Names = c("Id", 
"Date", "Test", "Result"), row.names = c(3L, 6L, 4L, 2L, 1L, 
5L, 10L, 14L, 9L, 19L, 8L, 11L, 20L, 18L, 7L, 17L, 13L, 21L, 
12L, 15L, 16L, 22L, 28L, 29L, 24L, 31L, 26L, 33L, 34L, 32L, 37L, 
23L, 35L, 25L, 38L, 36L, 30L, 27L, 39L, 40L), class = "data.frame")

I don't know if this is what you want, but you can use dplyr package. The code below will group the data by "Id", then find the first & last values in Result and finally calculate the "difference" in a new column

mydata <- structure(list(Id=c ( "10200", "10200", "10200", "10200", "10200", "10200", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10766", "10766", "10766",
"10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766" ), Date=s tructure(c(15068, 15068, 15068, 15069, 15069, 15069, 15072, 15072, 15072, 15072, 15072, 15072, 15073, 15073,
15073, 15075, 15075, 15075, 15078, 15078, 15078, 15073, 15074, 15074, 15075, 15075, 15075, 15075, 15076, 15076, 15076, 15078, 15078, 15078, 15081, 15082, 15083, 15084, 15085, 15085), class="Date" ), Test=c ( "T1", "T2", "T3", "T1", "T2", "T3", "T1",
"T1", "T2", "T2", "T3", "T3", "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2", "T2", "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2"), Result=c (131, 4.53, 5.4, 108, 3.19, 3.7, 125, NA, 1.26,
NA, NA, 3.8, 125, 0.97, 4.2, 73, 0.84, 6.6, 48, 0.52, 4.8, 60, 75, 0.83, 52, 51, 0.62, 0.65, 40, 0.57, 4.1, 45, 0.54, 3.7, 96, 77, 1.04, 134, 144, 0.95)), .Names=c ( "Id", "Date", "Test", "Result"), row.names=c (3L, 6L, 4L, 2L, 1L, 5L, 10L, 14L, 9L, 19L,
8L, 11L, 20L, 18L, 7L, 17L, 13L, 21L, 12L, 15L, 16L, 22L, 28L, 29L, 24L, 31L, 26L, 33L, 34L, 32L, 37L, 23L, 35L, 25L, 38L, 36L, 30L, 27L, 39L, 40L), class="data.frame" )

library(dplyr) 

result <- mydata %>%
  group_by(Id) %>%  
  summarise_each(funs(first, last), Result) %>%
  mutate(difference = first - last)
result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM