简体   繁体   中英

Marking outliers in time series data

I have a df of thousands of entries of particular lab value for patients with each row representing one instance they had the lab taken. I am interested in looking at the change in this value over time after a surgery. If the value rises and falls back to baseline within an acute time period I need to exclude the rise, however if it rises and stays above baseline I need to keep these values. I am able to mark if the value rises past a certain threshold within a time period, but I'm unsure how to code if it returns to baseline within a particular range of time. My ultimate goal is to use a geom_smooth to trend the value over time based on a particular procedure type, but need to exclude these outliers for my graphs to be correct. Any help would be very appreciated!

My data is organized like this:

lab Date Lab value study ID Acutely Past threshold
1/1/2001 2 1 NA
4/1/2001 2.3 1 N
5/2/2002 2.3 1 N
4/8/2018 1 2 NA
4/9/2018 3.8 2 Y
4/15/2018 1 2 N
5/1/2016 1.0 3 NA
5/2/2016 1.2 3 N
4/1/1997 1.0 4 NA
4/4/1997 2.5 4 Y
5/5/1997 2.5 4 N

For further reference, when posting data it is better to use dput in order to provide a reproducible example. I thinking something like this might work. You would need to identify the "episodes" in which the value went over the threshold. In this code, the output I think you're looking for is "episode"

df %>%
  group_by(id) %>%
  mutate(
    potential_episode_grp = (lab_value > normal_level) * data.table::rleid(lab_value > normal_level)
  ) %>%
  group_by(id, potential_episode_grp) %>%
  mutate(episode = as.integer(potential_episode_grp > 0 & any(lab_value > threshold_you_want)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM