简体   繁体   English

不规则时间序列中多个向量的移动平均值或总和计算

[英]Moving average or sum calculation on multiple vectors in irregular time series

I have a dataframe that looks like this (dput is below): 我有一个看起来像这样的数据框(dput在下面):

Date        SiteSub       HeatingDegreeDay MCnt  MCnt_lag7  MCnt_lag9
2009-11-01  EC_BC.Z_Z     0.00             0     0          0
2009-11-02  EC_BC.Z_Z     0.00             0     0          0
2009-11-03  EC_BC.Z_Z     0.00             0     0          0
2009-11-04  EC_BC.Z_Z     0.00             0     0          0
2009-11-05  EC_BC.Z_Z     0.00             0     0          0
2009-11-06  EC_BC.Z_Z     0.00             0     0          0
2009-11-07  EC_BC.Z_Z     0.00             1     0          0

I am trying to calculate moving sums OR averages of width 7 for HeatingDegreeDay , MCnt , MCnt_lag7 , MCnt_lag9 in this dataframe. 我正在尝试为此数据帧中的HeatingDegreeDayMCntMCnt_lag7MCnt_lag9计算移动总和或宽度7的平均值。 Some characteristics of this data to consider are: irregular time series with missing dates and NA values in the HeatingDegreeDay vector. 该数据要考虑的一些特征是:不规则的时间序列,缺少日期和HeatingDegreeDay向量中的NA值。

Once I have the 7-day moving sums OR averages calculated, I need to calculate correlation coefficients to help me identify which lag (7-day or 9-day) is best to match with the HeatingDegreeDay vector. 一旦计算出7天的移动总和或平均值,就需要计算相关系数,以帮助我确定最适合与HeatingDegreeDay向量匹配的滞后(7天或9天)。

Question: Can the moving sum or average calculation be combined with a correlation coefficient calculation in the same code or do they need to be done in steps? 问题:可以将移动总和或平均值计算与相关系数计算以同一代码组合在一起,还是需要分步进行? If so, how? 如果是这样,怎么办?

Problems: In calculating the moving sums OR averages, I keep running into troubles. 问题:在计算移动总和或平均值时,我总是遇到麻烦。 First, with rollapply, I cannot pass multiple vectors to rollapply as it seems univariate. 首先,使用rollapply时,由于似乎是单变量的,因此我无法传递多个向量进行rollapply Second, with TTR 's SMA I get an "incorrect number of dimensions" error. 其次,使用TTRSMA我收到“尺寸错误”的错误。 I can't use rollmean because my data has NA s. 我无法使用rollmean,因为我的数据包含NA

I have looked at: R: How to apply moving averages to subset of columns in a data frame? 我看过: R:如何将移动平均值应用于数据框中的列子集? and Conditional rolling mean (moving average) on irregular time series . 不规则时间序列上的条件滚动平均值(移动平均值)

I tried: 我试过了:

#Calculate moving average 
Lag0910_79 <- as.numeric(Lag0910_79$HeatingDegreeDay, Lag0910_79$MCnt7, Lag0910_79$MCnt9)
Lagzoo <- as.zoo(Lag0910_79)
Lagzoo_7 <- rollapply(Lagzoo, width=7, mean, na.rm=TRUE)
Lagzoo_7 <- as.data.frame(Lagzoo_7)

with result: 结果:

dput(head(Lagzoo_7, 15))

structure(list(Lagzoo_7 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1)), .Names = "Lagzoo_7", row.names = c("4", "5", "6", 
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", 
"18"), class = "data.frame")`

and: 和:

Lagzoo.ttr <- SMA(Lag0910_79[, "HeatingDegreeDay"], 7)

Error in Lag0910_79[, "HeatingDegreeDay"] : incorrect number of dimensions` Lag0910_79 [,“ HeatingDegreeDay”]中的错误:维数不正确`

How can I make this work? 我该如何进行这项工作? Clearly I don't have it right. 显然我没有正确的选择。 Thanks for your help! 谢谢你的帮助!

My data is structured like: 我的数据结构如下:

structure(list(Date = structure(c(14549, 14550, 14551, 14552, 
14553, 14554, 14555, 14556, 14557, 14558, 14559, 14560, 14561, 
14562, 14563, 14564, 14565, 14566, 14567, 14568, 14569, 14570, 
14571, 14572, 14573, 14574, 14575, 14576, 14577, 14578, 14579, 
14580, 14581, 14582, 14583, 14584, 14585, 14586, 14587, 14588, 
14589, 14590, 14591, 14592, 14593, 14594, 14595, 14596, 14597, 
14598, 14599, 14600, 14601, 14602, 14603, 14604, 14605, 14606, 
14607, 14608, 14609, 14610, 14611, 14612, 14613, 14614, 14615, 
14616, 14617, 14618, 14619, 14620, 14620, 14620, 14621, 14622, 
14622, 14623, 14624, 14625, 14626, 14627, 14628, 14629, 14629, 
14629, 14629, 14629, 14630, 14631, 14631, 14631, 14632, 14632, 
14632, 14632, 14632, 14632, 14632, 14633, 14633, 14633, 14634, 
14634, 14634, 14634, 14635, 14635, 14635, 14635, 14636, 14636, 
14636, 14636, 14636, 14636, 14637, 14637, 14637, 14638, 14638, 
14638, 14639, 14639, 14640, 14641, 14642, 14643, 14643, 14644, 
14645, 14646, 14647, 14648, 14649, 14650, 14651, 14652, 14653, 
14654, 14655, 14656, 14657, 14658, 14659, 14660, 14661, 14661, 
14662, 14663, 14663, 14664, 14665, 14666, 14667, 14668, 14669, 
14669, 14670, 14671, 14672, 14673, 14674, 14675, 14675, 14676, 
14677, 14678, 14678, 14679, 14680, 14681, 14681, 14681, 14682, 
14682, 14682, 14683, 14684, 14685, 14686, 14687, 14688, 14689, 
14689, 14690, 14691, 14692, 14693, 14694, 14694, 14694, 14695, 
14696, 14697, 14698, 14699, 14700, 14701, 14702, 14703, 14703, 
14703, 14703, 14704, 14704, 14705, 14706), class = "Date"), SiteSub = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EC_BC.Z_Z", class = "factor"), 
    HeatingDegreeDay = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 3L, 4L, 9L, 11L, 
    14L, 15L, 12L, 13L, 17L, 16L, 16L, 16L, 10L, 8L, 8L, 7L, 
    6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0.00", 
    "0.02", "0.05", "0.14", "0.32", "0.50", "0.89", "0.96", "0.98", 
    "1.02", "1.04", "1.30", "1.40", "1.49", "1.50", "1.58", "1.86"
    ), class = "factor"), MCnt = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 
    2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
    1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
    2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), 
    MCnt_lag7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
    2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 
    2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
    2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, 
    NA, NA, NA, NA), .Label = c("0", "1"), class = "factor"), 
    MCnt_lag9 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 
    1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
    2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), .Label = c("0", "1"), class = "factor")), .Names = c("Date", 
"SiteSub", "HeatingDegreeDay", "MCnt", "MCnt_lag7", "MCnt_lag9"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", 
"32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", 
"43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", 
"54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", 
"65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", 
"76", "77", "78", "79", "80", "81", "82", "83", "84", "85", "86", 
"87", "88", "89", "90", "91", "92", "93", "94", "95", "96", "97", 
"98", "99", "100", "101", "102", "103", "104", "105", "106", 
"107", "108", "109", "110", "111", "112", "113", "114", "115", 
"116", "117", "118", "119", "120", "121", "122", "123", "124", 
"125", "126", "127", "128", "129", "130", "131", "132", "133", 
"134", "135", "136", "137", "138", "139", "140", "141", "142", 
"143", "144", "145", "146", "147", "148", "149", "150", "151", 
"152", "153", "154", "155", "156", "157", "158", "159", "160", 
"161", "162", "163", "164", "165", "166", "167", "168", "169", 
"170", "171", "172", "173", "174", "175", "176", "177", "178", 
"179", "180", "181", "182", "183", "184", "185", "186", "187", 
"188", "189", "190", "191", "192", "193", "194", "195", "196", 
"197", "198", "199", "200", "201", "202", "203", "204", "205", 
"206", "207", "208"), class = "data.frame")

If the data frame shown in the dput output of the question is DF then this converts columns 3:6 to numeric, performs the rollmean calculation producing rmean , a matrix of rolling means. 如果问题的dput输出中显示的数据框是DF则这会将列3:6转换为数值,执行rollmean计算,生成rmean ,即滚动平均值矩阵。 It then uses corNA to produce a vector, rcor , of rolling correlations and puts everything into one data frame, DF3 : 然后,它使用corNA生成滚动相关性的向量rcor ,并将所有内容放入一个数据帧DF3

library(zoo)

DF2 <- DF
DF2[3:6] <- lapply(DF2[3:6], function(x) as.numeric(as.character(x)))
m <- as.matrix(DF2[3:6])
rmean <- rollapplyr(m, 7, mean, na.rm = TRUE, fill = NA) # mean matrix

corNA <- function(x) {
    x <- na.omit(x[, 1:2])
    if (nrow(x) < 2 || sd(x[,1]) == 0 || sd(x[,2]) == 0) return(NA)
    cor(x[, 1], x[,2])
}

rcor <- rollapplyr(m, 7, corNA, by.column = FALSE, fill = NA) # vector of cors

DF3 <- data.frame(DF2, rmean, rcor) # put it all together

The zoo version is here. 动物园版本在这里。 Since zoo requires unique dates we aggregate rows with equal dates: 由于zoo需要唯一的日期,因此我们汇总具有相等日期的行:

z <- read.zoo(DF2[-2], aggregate = mean) # can omit aggregate=mean if dates are unique

zmean <- rollapplyr(z, 7, mean, na.rm = TRUE, fill = NA) # means
zcor <- rollapplyr(z, 7, corNA, by.column = FALSE, fill = NA) # cors

z2 <- merge(z, zmean, zcor) # omit this if separate objects are ok

Is this what you wanted for the rolling mean? 这是您想要滚动的意思吗?

# Convert dates to days
aa = as.Date(x$Date)
x$Date = as.numeric(aa - aa[1])

# I think it's easier to get rid of the factors
factor2number = function(x) as.numeric(as.character(x))
x[,3:6] = apply(x[,3:6],2,factor2number)

# A rolling mean function
rollmean_r = function(x,y,width) {
  out = numeric(length(x))
  for( i in seq_along(x) ) {
    window = x >= (x[i]-width) & x <= (x[i]+width)
    out[i] = .Internal(mean( y[window] ))
  }
  return(out)
}

# Calculate the rolling means
x[,3:6] = apply(x[3:6], 2, function(y) rollmean_r(x$Date,y,7) )
x
#    Date   SiteSub HeatingDegreeDay       MCnt MCnt_lag7  MCnt_lag9
# 1     0 EC_BC.Z_Z                0 0.12500000         0         0
# 2     1 EC_BC.Z_Z                0 0.11111111         0         0
# 3     2 EC_BC.Z_Z                0 0.10000000         0         0
# 4     3 EC_BC.Z_Z                0 0.09090909         0         0
# 5     4 EC_BC.Z_Z                0 0.08333333         0         0
# 6     5 EC_BC.Z_Z                0 0.07692308         0         0
# 7     6 EC_BC.Z_Z                0 0.07142857         0         0
# 8     7 EC_BC.Z_Z                0 0.06666667         0         0
# 9     8 EC_BC.Z_Z                0 0.07142857         0         0
# 10    9 EC_BC.Z_Z                0 0.07692308         0         0
# 11   10 EC_BC.Z_Z                0 0.08333333         0         0
# 12   11 EC_BC.Z_Z                0 0.09090909         0         0
# 13   12 EC_BC.Z_Z                0 0.10000000         0         0
# 14   13 EC_BC.Z_Z                0 0.11111111         0         0
# 15   14 EC_BC.Z_Z                0 0.00000000         0         0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM