Ok so that title is quite a mouthful but here's the problem I solved and I was curious if anyone had a better solution or could generalize it further.
I have a time series as a data.table
and I'm interested in finding out if an observation "bucks the trend" so to speak of the data before and after. Ie Is this observation larger than the year of observations before and after ?
To do this, my thought was to build in another column that grabs the max from the rows above or below and then just check if a row is equal to that max.
My data, luckily was regularly ordered, meaning that every row is the same distance of time from it's neighboring row. I use this fact to manually specify window size, rather than having to check if each row is within the time distance of interest.
#######################
# Package Loading
usePackage <- function(p) {
if (!is.element(p, installed.packages()[,1]))
install.packages(p, dep = TRUE)
require(p, character.only = TRUE)
}
packages <- c("data.table","lubridate")
for(package in packages) usePackage(package)
rm(packages,usePackage)
#######################
set.seed(1337)
# creating a data.table
mydt <- data.table(Name = c(rep("Roger",12),rep("Johnny",8),"Mark"),
Date = c(seq(ymd('2010-06-15'),ymd('2015-12-15'), by = '6 month'),
seq(ymd('2012-06-15'),ymd('2015-12-15'), by = '6 month'),
ymd('2015-12-15')))
mydt[ , Value := c(rnorm(12,15,1),rnorm(8,30,2),rnorm(1,100,30))]
setkey(mydt, Name, Date)
# setting the number of rows up or down to check
windowSize <- 2
# applying the windowing max function
mydt[,
windowMax := unlist(lapply(1:.N, function(x) max(.SD[Filter(function(y) y>0 & y <= .N, unique(abs(x+(-windowSize:windowSize)))), Value]))),
by = Name]
# checking if a value is the local max (by window)
mydt[, isMaxValue := windowMax == Value]
mydt
As you can see, the windowing function is a mess but it does the trick. My question is: do you know a simpler, more succinct, or more readable way to do the same thing? Do you know how to generalize this to take irregular time series into account (ie not a fixed window)? I couldn't get the zoo::rollapply
to do what I wanted but I don't have that much experience with it (I couldn't solve the problem of a group with 1 row causing the function to crash).
Let me know your thoughts and thank you!
This doesn't really address the time-window part, but if you want a one-liner with zoo::rollapply
, you can do:
width <- 2 * windowSize + 1 # One central obs. and two on each side
mydt[, isMaxValue2 := rollapply(Value, width, max, partial = TRUE) == Value, by=Name]
identical(mydt$isMaxValue, mydt$isMaxValue2) # TRUE
It's somewhat more legible than your proposed solution, I think.
The partial = TRUE
argument deals with the "border effects" when there are less than 5 observations in the window.
I think something like rollapply
(@hfty's answer) makes more sense, but here's another way:
mydt[, wmax := do.call(pmax, c(
shift(Value, 2:1, type = "lag"),
shift(Value, 0:2, type = "lead"),
list(na.rm = TRUE)
)), by=Name]
which seems to work:
Name Date Value windowMax wmax
1: Johnny 2012-06-14 20:00:00 30.31510 32.97827 32.97827
2: Johnny 2012-12-14 19:00:00 32.97827 32.97827 32.97827
3: Johnny 2013-06-14 20:00:00 29.84842 32.97827 32.97827
4: Johnny 2013-12-14 19:00:00 32.54356 32.97827 32.97827
5: Johnny 2014-06-14 20:00:00 31.28335 33.72532 33.72532
6: Johnny 2014-12-14 19:00:00 31.60152 33.72532 33.72532
7: Johnny 2015-06-14 20:00:00 33.72532 33.72532 33.72532
8: Johnny 2015-12-14 19:00:00 28.90929 33.72532 33.72532
9: Mark 2015-12-14 19:00:00 118.57833 118.57833 118.57833
10: Roger 2010-06-14 20:00:00 15.19249 15.19249 15.19249
11: Roger 2010-12-14 19:00:00 13.55330 16.62230 16.62230
12: Roger 2011-06-14 20:00:00 14.67682 16.62230 16.62230
13: Roger 2011-12-14 19:00:00 16.62230 17.04212 17.04212
14: Roger 2012-06-14 20:00:00 14.31098 17.04212 17.04212
15: Roger 2012-12-14 19:00:00 17.04212 17.08193 17.08193
16: Roger 2013-06-14 20:00:00 15.94378 17.08193 17.08193
17: Roger 2013-12-14 19:00:00 17.08193 17.08193 17.08193
18: Roger 2014-06-14 20:00:00 16.91712 17.08193 17.08193
19: Roger 2014-12-14 19:00:00 14.58519 17.08193 17.08193
20: Roger 2015-06-14 20:00:00 16.03285 16.91712 16.91712
21: Roger 2015-12-14 19:00:00 13.32143 16.03285 16.03285
Name Date Value windowMax wmax
To see how it works, one could look at the vectors before the pmax
is taken:
mydt[, c(
shift(Value, 2:1, type = "lag"),
shift(Value, 0:2, type = "lead")
), by=Name]
# Name V1 V2 V3 V4 V5
# 1: Johnny NA NA 30.31510 32.97827 29.84842
# 2: Johnny NA 30.31510 32.97827 29.84842 32.54356
# 3: Johnny 30.31510 32.97827 29.84842 32.54356 31.28335
# 4: Johnny 32.97827 29.84842 32.54356 31.28335 31.60152
# 5: Johnny 29.84842 32.54356 31.28335 31.60152 33.72532
# 6: Johnny 32.54356 31.28335 31.60152 33.72532 28.90929
# 7: Johnny 31.28335 31.60152 33.72532 28.90929 NA
# 8: Johnny 31.60152 33.72532 28.90929 NA NA
# 9: Mark NA NA 118.57833 NA NA
# 10: Roger NA NA 15.19249 13.55330 14.67682
# 11: Roger NA 15.19249 13.55330 14.67682 16.62230
# 12: Roger 15.19249 13.55330 14.67682 16.62230 14.31098
# 13: Roger 13.55330 14.67682 16.62230 14.31098 17.04212
# 14: Roger 14.67682 16.62230 14.31098 17.04212 15.94378
# 15: Roger 16.62230 14.31098 17.04212 15.94378 17.08193
# 16: Roger 14.31098 17.04212 15.94378 17.08193 16.91712
# 17: Roger 17.04212 15.94378 17.08193 16.91712 14.58519
# 18: Roger 15.94378 17.08193 16.91712 14.58519 16.03285
# 19: Roger 17.08193 16.91712 14.58519 16.03285 13.32143
# 20: Roger 16.91712 14.58519 16.03285 13.32143 NA
# 21: Roger 14.58519 16.03285 13.32143 NA NA
# Name V1 V2 V3 V4 V5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.