[英]R: Calculations over consecutive days?
This question builds upon this question: 这个问题建立在这个问题上:
R: Calculate moving maximum slope by week accounting for factors R:按周计算因子的移动最大斜率
My question: 我的问题:
The code pasted below calculates maximum slope over a 7-day period using length(HDD)
. 下面粘贴的代码使用length(HDD)
计算7天内的最大斜率。 I would like to be more discriminate in that, I only want MaxSlope calculated for consecutive 7-day periods. 我想更加区别对待,我只想要连续7天计算MaxSlope。
For example, a gap exists in the data from 2004-12-26 to 2004-12-30. 例如,2004-12-26至2004-12-30的数据存在差距。 Considering only this portion of data I have copied here, the MaxSlope should only be calculated for 2004-12-23 and 2004-12-24. 仅考虑我在这里复制的这部分数据,MaxSlope只应计算2004-12-23和2004-12-24。 All other dates should have "NA" inserted. 所有其他日期都应插入“NA”。 This dataset will grow to several million records, hence efficiency is important. 该数据集将增长到数百万条记录,因此效率很重要。
NOTE: I subset my data.frame to provide only the columns important here. 注意:我将我的data.frame子集化为仅提供此处重要的列。 The by
statement in the MaxSlope code is important as it is applied to the entire data.frame. MaxSlope代码中的by
语句很重要,因为它应用于整个data.frame。
I have no idea where to begin with consecutive date calculations. 我不知道从哪里开始连续日期计算。 Any ideas? 有任何想法吗?
Thank you! 谢谢!
Code I used to arrive at Maximum Slope Calculation: 我以前的代码达到最大斜率计算:
RawByDayALL <- data.table(RawByDayALL)
RawByDayALL[, MaxSlope := if(length(HDD)<7) {rep(NA_real_, length(HDD))} else {filter(HDD, c(1,1,1,1,1,1,0)/7)}, by=list(WinterID, SiteID, SubstrateConcat)]
RawByDayALL[is.na(MaxSlope), MaxSlope := -99L]
Structure of my data: 我的数据结构:
> dput(RawByDayALL[650:660])
structure(list(WinterID = structure(c(6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L), .Label = c("2002", "2002_2003", "2003",
"2003_2004", "2004", "2004_2005", "2005", "2005_2006", "2006",
"2006_2007", "2007", "2007_2008", "2008"), class = "factor"),
Date = structure(c(12771, 12772, 12773, 12774, 12775, 12776,
12777, 12778, 12782, 12783, 12784), class = "Date"), SiteID = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "NW_SB", class = "factor"),
SubstrateConcat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("B_A", "B_B"), class = "factor"),
HDD = c(17.3533333333333, 35.1066666666667, 82.6266666666667,
51.68, 36.22, 39.6066666666667, 38.0533333333333, 47.8333333333333,
4.18, 9.66, 1.5), MaxSlope = c(30.4104761904762, 33.3885714285714,
37.5133333333333, 40.4704761904762, 42.2885714285714, 31.0819047619048,
25.0790476190476, 20.1190476190476, 14.6019047619048, 9.19428571428571,
2.6552380952381)), .Names = c("WinterID", "Date", "SiteID",
"SubstrateConcat", "HDD", "MaxSlope"), class = c("data.table",
"data.frame"), row.names = c(NA, -11L), .internal.selfref = <pointer: 0x0000000000100788>)
What a portion of the data look like: 数据的一部分是什么样的:
WinterID Date SiteID SubstrateConcat HDD MaxSlope
650 2004_2005 2004-12-19 NW_SB B_B 17.35333333 30.41047619
651 2004_2005 2004-12-20 NW_SB B_B 35.10666667 33.38857143
652 2004_2005 2004-12-21 NW_SB B_B 82.62666667 37.51333333
653 2004_2005 2004-12-22 NW_SB B_B 51.68000000 40.47047619
654 2004_2005 2004-12-23 NW_SB B_B 36.22000000 42.28857143
655 2004_2005 2004-12-24 NW_SB B_B 39.60666667 31.08190476
656 2004_2005 2004-12-25 NW_SB B_B 38.05333333 25.07904762
657 2004_2005 2004-12-26 NW_SB B_B 47.83333333 20.11904762
658 2004_2005 2004-12-30 NW_SB B_B 4.18000000 14.60190476
659 2004_2005 2004-12-31 NW_SB B_B 9.66000000 9.19428571
660 2004_2005 2005-01-01 NW_SB B_B 1.50000000 2.65523810
EDITED to include answer provided by @eddi. 已编辑,包括@eddi提供的答案。 Thank you for the simple fix! 谢谢你的简单修复!
RawByDayALL <- data.table(RawByDayALL)
RawByDayALL[, MaxSlope := if(length(HDD)<7) {rep(NA_real_, length(HDD))} else {filter(HDD, c(1,1,1,1,1,1,0)/7)}, by=list(WinterID, SiteID, SubstrateConcat, cumsum(diff(c(Date[1], as.IDate(Date))) > 1))]
RawByDayALL[is.na(MaxSlope), MaxSlope := -99L]
This will give you the consecutive day grouping that you need: 这将为您提供所需的连续日分组:
dt[, cumsum(diff(c(Date[1], as.IDate(Date))) > 1)]
And this is how you can put it in your by
in addition to your other columns: 除了你的其他专栏,这就是你如何把它放在你by
旁边:
dt[, your_calculation,
by = list(various_columns, cumsum(diff(c(Date[1], as.IDate(Date))) > 1))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.