[英]Finding max of data.table column within range of rows from current observation by group
好的,这样的标题相当可口,但这是我解决的问题,我很好奇是否有人有更好的解决方案或可以将其进一步推广。
我有一个时间序列作为data.table
,我有兴趣了解一下观察值是否“逆势”,可以这么说。 即该观测值是否大于前后观测值的年份?
为此,我的想法是建立另一列,该列从上方或下方的行中获取最大值,然后仅检查一行是否等于该最大值。
幸运的是,我的数据是有规律地排序的,这意味着每一行到相邻行的时间都是相同的。 我用这个事实来手动指定窗口大小,而不用检查每一行是否在感兴趣的时间范围内。
#######################
# Package Loading
usePackage <- function(p) {
if (!is.element(p, installed.packages()[,1]))
install.packages(p, dep = TRUE)
require(p, character.only = TRUE)
}
packages <- c("data.table","lubridate")
for(package in packages) usePackage(package)
rm(packages,usePackage)
#######################
set.seed(1337)
# creating a data.table
mydt <- data.table(Name = c(rep("Roger",12),rep("Johnny",8),"Mark"),
Date = c(seq(ymd('2010-06-15'),ymd('2015-12-15'), by = '6 month'),
seq(ymd('2012-06-15'),ymd('2015-12-15'), by = '6 month'),
ymd('2015-12-15')))
mydt[ , Value := c(rnorm(12,15,1),rnorm(8,30,2),rnorm(1,100,30))]
setkey(mydt, Name, Date)
# setting the number of rows up or down to check
windowSize <- 2
# applying the windowing max function
mydt[,
windowMax := unlist(lapply(1:.N, function(x) max(.SD[Filter(function(y) y>0 & y <= .N, unique(abs(x+(-windowSize:windowSize)))), Value]))),
by = Name]
# checking if a value is the local max (by window)
mydt[, isMaxValue := windowMax == Value]
mydt
如您所见,开窗功能虽然杂乱无章,但却可以解决问题。 我的问题是:您知道做同一件事的更简单,更简洁或更易读的方式吗? 您是否知道如何对此进行概括以考虑不规则的时间序列(即不是固定的窗口)? 我不能让zoo::rollapply
做我想做的事,但是我没有太多的经验(我无法解决具有1行的组导致函数崩溃的问题)。
让我知道您的想法,谢谢!
这并没有真正解决时间窗口部分,但是如果您想要使用zoo::rollapply
,则可以执行以下操作:
width <- 2 * windowSize + 1 # One central obs. and two on each side
mydt[, isMaxValue2 := rollapply(Value, width, max, partial = TRUE) == Value, by=Name]
identical(mydt$isMaxValue, mydt$isMaxValue2) # TRUE
我认为,这比您提出的解决方案更清晰。
当窗口中的观察少于5个时, partial = TRUE
参数处理“边界效应”。
我认为类似rollapply
(@hfty的答案)的说法更有意义,但这是另一种方式:
mydt[, wmax := do.call(pmax, c(
shift(Value, 2:1, type = "lag"),
shift(Value, 0:2, type = "lead"),
list(na.rm = TRUE)
)), by=Name]
这似乎可行:
Name Date Value windowMax wmax
1: Johnny 2012-06-14 20:00:00 30.31510 32.97827 32.97827
2: Johnny 2012-12-14 19:00:00 32.97827 32.97827 32.97827
3: Johnny 2013-06-14 20:00:00 29.84842 32.97827 32.97827
4: Johnny 2013-12-14 19:00:00 32.54356 32.97827 32.97827
5: Johnny 2014-06-14 20:00:00 31.28335 33.72532 33.72532
6: Johnny 2014-12-14 19:00:00 31.60152 33.72532 33.72532
7: Johnny 2015-06-14 20:00:00 33.72532 33.72532 33.72532
8: Johnny 2015-12-14 19:00:00 28.90929 33.72532 33.72532
9: Mark 2015-12-14 19:00:00 118.57833 118.57833 118.57833
10: Roger 2010-06-14 20:00:00 15.19249 15.19249 15.19249
11: Roger 2010-12-14 19:00:00 13.55330 16.62230 16.62230
12: Roger 2011-06-14 20:00:00 14.67682 16.62230 16.62230
13: Roger 2011-12-14 19:00:00 16.62230 17.04212 17.04212
14: Roger 2012-06-14 20:00:00 14.31098 17.04212 17.04212
15: Roger 2012-12-14 19:00:00 17.04212 17.08193 17.08193
16: Roger 2013-06-14 20:00:00 15.94378 17.08193 17.08193
17: Roger 2013-12-14 19:00:00 17.08193 17.08193 17.08193
18: Roger 2014-06-14 20:00:00 16.91712 17.08193 17.08193
19: Roger 2014-12-14 19:00:00 14.58519 17.08193 17.08193
20: Roger 2015-06-14 20:00:00 16.03285 16.91712 16.91712
21: Roger 2015-12-14 19:00:00 13.32143 16.03285 16.03285
Name Date Value windowMax wmax
要查看其工作原理,可以在采用pmax
之前先查看向量:
mydt[, c(
shift(Value, 2:1, type = "lag"),
shift(Value, 0:2, type = "lead")
), by=Name]
# Name V1 V2 V3 V4 V5
# 1: Johnny NA NA 30.31510 32.97827 29.84842
# 2: Johnny NA 30.31510 32.97827 29.84842 32.54356
# 3: Johnny 30.31510 32.97827 29.84842 32.54356 31.28335
# 4: Johnny 32.97827 29.84842 32.54356 31.28335 31.60152
# 5: Johnny 29.84842 32.54356 31.28335 31.60152 33.72532
# 6: Johnny 32.54356 31.28335 31.60152 33.72532 28.90929
# 7: Johnny 31.28335 31.60152 33.72532 28.90929 NA
# 8: Johnny 31.60152 33.72532 28.90929 NA NA
# 9: Mark NA NA 118.57833 NA NA
# 10: Roger NA NA 15.19249 13.55330 14.67682
# 11: Roger NA 15.19249 13.55330 14.67682 16.62230
# 12: Roger 15.19249 13.55330 14.67682 16.62230 14.31098
# 13: Roger 13.55330 14.67682 16.62230 14.31098 17.04212
# 14: Roger 14.67682 16.62230 14.31098 17.04212 15.94378
# 15: Roger 16.62230 14.31098 17.04212 15.94378 17.08193
# 16: Roger 14.31098 17.04212 15.94378 17.08193 16.91712
# 17: Roger 17.04212 15.94378 17.08193 16.91712 14.58519
# 18: Roger 15.94378 17.08193 16.91712 14.58519 16.03285
# 19: Roger 17.08193 16.91712 14.58519 16.03285 13.32143
# 20: Roger 16.91712 14.58519 16.03285 13.32143 NA
# 21: Roger 14.58519 16.03285 13.32143 NA NA
# Name V1 V2 V3 V4 V5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.